16/04/2019

Hacking "Temporal Locality"

Introduction

The reason for this blog post is to analyse certain types of attacks that relate to cache manipulation and recently resurfaced by various BlackHat and Defcon presentation. More specifically we are interested in the following type of attacks:

  • Web Cache Poisoning Attacks 
  • Web Cache Deception Attacks
About the cache

Many people fail to understand what exactly what is a Web cache, and therefore, I am going to invest a lot of time to analyse and explain what is a cache from Hacker/Security Professional perspective, when conducting a pentest or simple hacking a site.

The cache

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster [1]. Hmm interesting, very interesting, also the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere [1]. So data might be replicated to other locations within the system that serves the content. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Some companies host their own cache using software like Varnish, and others opt to rely on a Content Delivery Network (CDN) like Cloudflare, with caches scattered across geographical locations. Also, some popular web applications and frameworks like Drupal have a built-in cache. [3]

The diagram above we have a simplified scenario, were the user has two different paths:

  • 1 blue - 2 blue - 3 yellow and 4 yellow 
  • 1 blue - 2 blue - 3 orange 
The path to be followed (aka. user flow interaction with the target web system) depends on the cache device internal decision process. Cache device internal decision process, simplistically speaking is the the cache device algorithm used to make decisions on what content would be served, and the part we would be interested in hacking or subverting.     


Cache manipulation

The following diagram demonstrates how someone can potentially manipulate the web cache to extract sensitive information:


The legitimate user in Step 1 interacts with the web cache system (aka. the web server and the front end web cache system) and submit/retrieve sensitive content (which should not be cached in the first place). The hacker assesses the rules the cache server is using to store local user content (e.g. identify through experimentation which URL paths are being stored in the cache server etc.) copies and start retrieving sensitive information.

Web caching is a core design feature of the HTTP protocol meant to minimize network traffic while improving the perceived responsiveness of the system as a whole. Caches can be found at every level of a content's journey from the original server to the browser. [6]

Web caching works by caching the HTTP responses for requests according to certain rules. Subsequent requests for cached content can then be fulfilled from a cache closer to the user.


What usually is cached?


Certain content lends itself more readily to caching than others. Some very cache-friendly content for most sites are:
  • Logos and brand images
  • Non-rotating images in general (navigation icons, for example)
  • Style sheets
  • General Javascript files
  • Downloadable Content
  • Media Files
  • HTML pages
  • Rotating images
  • Frequently modified Javascript and CSS
  • Content requested with authentication cookies[6]
Putting things in perspective

In order to understand the importance/complexity of the attack it is better to elaborate that high traffic systems (e.g. media content servers etc.)  use multiple cache servers. Usually these type of systems assign web cache servers to whole regions (e.g. USA Region cache, EU Region cache etc.). These regions might be whole countries or even continents. Therefore  the significance of the impact depends on the following two factors:
  • The scope of the vulnerable cache servers
  • The content exposed through the cache servers
The following diagram demonstrate the issue:


The following diagram demonstrates a complicated infrastructure on cache management:



Note: In order for an attacker to attack the system she would have to assess the set of the rules of all the intermediate cache proxies.

Web cache criteria 

Web cache is achieved through the the "web cache keys". A web cache key is an identifier of a resource located on the web server. As a study case we will refer to the Akamai community posts to see how web cache keys as configured.

The following section is community post describing the concept of the Akamai Cache Key. This information is deduced from several Akamai configuration settings posted in the past. Issues discussed are:
  • How does the Edge Server knows which File needs to be cached?
  • How does the Edge Server retrieve the cached object from the “Cache Store”?
Note1: Content is cached on the so called “Cache Store”. The “Cache Store” does represent either the Memory (RAM) or Hard disk of a certain Edge Server.

Note2: An Akamai Edge server, is a cache server delivering content. To retrieve an object from the Akamai Platform, users must connect to an Akamai Edge server first. The server must apply a set of rules to the request, and then either locate the object in its cache or retrieve it from the origin. [12]

Note3: Also see sources [9] and [10].

The following diagram demonstrates a simple topology of an Akamai network:



To store an object on the Edge Server “Cache Store” we need to create the “Cache Key” first. The EdgeSuite Configuration Guide does mention that the Akamai Edge Server forms the “Cache Key”  based on parts of the "Request ARL".[11]

The ARL (Akamai Resource Location) is similar to an URL.The primary function of an ARL is to direct an end user’s request for an object to the Akamai network [13]. The ARL also contains the object’s caching properties.. The difference being that the ARL is specifically defined for objects to be served via the Akamai Network. There are two types of ARLs:
  1. ARL v1: This is the original ARL used in the earlier days of Akamai. It contains instructions for the Edge Server coded into its structure
  2. ARL v2: Instead of coding all instruction into the URL like done for ARL v1, ARL v2 does reference a Configuration File hosted on the Edge Server.
ARL Components which form the Cache Key:
  • Typecode
  • Forward [fwd] path (origin server, pathname, filename and extension)
  • Query string (Optional)
  • Secure Network Delivery Indicator
  • HTTP Method (GET, HEAD, etc.)
Note: The following description count mainly for ARL v2, we are not going to elaborate on ARL v1 as this are not used that often nowadays.

The following diagram breaks the ARL format:



The following section demonstrates the web cache keys using sample HTTP requests:

Request:

GET /products.jsp?productId=1 HTTP/1.1host: shop.edgegate.deUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0Accept: */*Pragma: akamai-x-get-cache-key

Response:

HTTP/1.1 200 OKContent-Type: text/html; charset=iso-8859-1Server: Google FrontendCache-Control: private, max-age=0Expires: Thu, 17 Dec 2015 00:00:06 GMTDate: Thu, 17 Dec 2015 00:00:06 GMTContent-Length: 1127X-Cache-Key: /L/1168/78685/1m/edgegatecpinotossi.appspot.com/products.jsp?productId=1Connection: keep-alive
Note: The text marked in red designate the web cache key. 

The following table explains the values used as web cache keys:

NameValue
TypecodeL
Serial1168
CPCode78685
TTL1m
fwd Pathedgegatecpinotossi.appspot.com/products.jsp
Query String?productId=1

Before the attack: Reconnaissance

Before progressing with any type of cache manipulation it does worth the trouble to review the route path the targeted web server. Running a query on Robtext on google.com will give us a lot of information that can be used to see if a cache proxy is used.

Below you can see en extract of the output in Robtext (https://www.robtex.com/dns-lookup/google.com#owhois):



Note: Using also other manual tools to see if there is a cache proxy in front of the webservice.


Finally the attack: Web Cache deception

Web cache deception occurs when the target website is configured to be "flexible" about what kinds of paths it can handle (aka. URL(s)). For more information on what a URL is see https://www.rfc-editor.org/info/rfc1738 . This make sense from usability perspective e.g. by the product being tolerant on certain types of inputs becomes more user friendly. Also this has to do how each software vendor interprets the RFC related to the URL structure.

In particular, the issue arises when requests to a path that doesn't exist (say /x/y/z) are treated as equivalent to requests to a parent path that does exist (say /x). For example, what happens if you get a request for the nonexistent path /newsfeed/foo? Depending on how your website is configured, it might just treat such a request as equivalent to a request to /newsfeed. For example, if you're running the Django web framework, the following configuration would do just that because the regular expression ^newsfeed/ matches both newsfeed/ and newsfeed/foo (Django routes omit the leading /): [14]

from django.conf.urls import url
patterns = [url(r'^newsfeed/', ...)]
And here's where the problem lies. If your website does this, then a request to /newsfeed/foo.jpg will be treated as the same as a request to /newsfeed. But a web cache, seeing the .jpg file extension, will think that it's OK to cache this request. Because usually most of the web caches proxies by default store image file extensions. [14]

Below we can see a schematic analysis of the issue:


Note: In the following diagram above we can see the how a malicious user can request the home page of the user. At this point is assumed that the home page contains sensitive information and requires some kind of login. In this example is also assumed that the cache server stores local copies of the site images.

It does also worth saying that this is a simplified, and that is someone would like to perform a more complicated attack would have to:
  • Understand the scope of the cache server e.g. region cache server.
  • Understand the cache rules of the cache server e.g. Akamai ARL etc.
  • Identify target content of interest e.g. sensitive content etc.   
Note: It does also worth mentioning that identifying how both the web and cache server "understand" the URL structure is also important e.g. experimenting with malicious paths, such as mangled back slashes etc. This also relates to what is considered acceptable also from the browsers.


Finally the attack: Web Cache poisoning 

The objective of web cache poisoning is to send a request that causes a harmful response that gets saved in the cache and served to other users. The following diagram shows the process to follow:

James Kettle (aka. @albinowax) has done an amazing job documenting the vulnerability and wrote about multiple scenarios and ways to exploit the specific vulnerability. More specifically described the following scenarios:
  • Selective Poisoning
  • DOM Poisoning
  • Hijacking Mozilla SHIELD
  • Route poisoning
  • Hidden Route Poisoning
  • Chaining Unkeyed Inputs
  • Open Graph Hijacking
  • Local Route Poisoning
  • Internal Cache Poisoning
  • Drupal Open Redirect
  • Persistent redirect hijacking
  • Nested cache poisoning
  • Cross-Cloud Poisoning
A simplified version of an attack scenario would be to:

a simple example of Web Cache poisoning would that assuming that the cache key is the X-Forwarded-Host HTTP header. we can Inject our own variable and then echoed it back in a cache level.

This is taken from https://portswigger.net/blog/practical-web-cache-poisoning :

Request:

GET /en?cb=1 HTTP/1.1
Host: www.redhat.com
X-Forwarded-Host: canary

Response:

GET /en?cb=1 HTTP/1.1
Host: www.redhat.com
X-Forwarded-Host: canary

HTTP/1.1 200 OK
Cache-Control: public, no-cache


<meta property="og:image" content="https://canary/cms/social.png" />

In the example above we saw that the cache key was echoed back in the html body. The X-Forwarded-Host header has been used by the application to generate an Open Graph URL inside a meta tag. In this scenario we can assume that this can be converted into an XSS, HTML or other type of client side injection attack.

Defending against Web Cache attacks

The best way to defend against this attack is to ensure that your website isn't so permissive, and never treats requests to nonexistent paths. Also that:
  • Use the same URL to refer to the same items: Since caches key off of both the host and the path to the content requested, ensure that you refer to your content in the same way on all of your pages. The previous recommendation makes this significantly easier. [6]
  • Fingerprint cache items: For static content like CSS and Javascript files, it may be appropriate to fingerprint each item (per user session). This means adding a unique identifier to the filename (often a hash of the file) so that if the resource is modified, the new resource name can be requested, causing the requests to correctly bypass the cache. [6]
  • Write your custom cache rules: A web cache server has to be aware of the application content and nature e.g. not caching dynamic content on banking application etc.    
  • Avoid taking input from headers and cookie: Simply filter HTTP headers and cookies by running integrity checks.
  • Disable cache if not required: Lots of services don't require caching, but because is enabled by default the allow it.
Tools for cache poisoning/deception 

The following section demonstrates tools that can be used to manipulate cache poisoning: 
  • param-miner: This extension identifies hidden, unlinked parameters. It's particularly useful for finding web cache poisoning vulnerabilities.[3]
  • Burp Suite Free/Pro: Intruder component [16]
References: