Hacking "Temporal Locality"


The reason for this blog post is to analyse certain types of attacks that relate to cache manipulation and recently resurfaced by various BlackHat and Defcon presentation. More specifically we are interested in the following type of attacks:

  • Web Cache Poisoning Attacks 
  • Web Cache Deception Attacks
About the cache

Many people fail to understand what exactly what is a Web cache, and therefore, I am going to invest a lot of time to analyse and explain what is a cache from Hacker/Security Professional perspective, when conducting a pentest or simple hacking a site.

The cache

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster [1]. Hmm interesting, very interesting, also the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere [1]. So data might be replicated to other locations within the system that serves the content. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Some companies host their own cache using software like Varnish, and others opt to rely on a Content Delivery Network (CDN) like Cloudflare, with caches scattered across geographical locations. Also, some popular web applications and frameworks like Drupal have a built-in cache. [3]

The diagram above we have a simplified scenario, were the user has two different paths:

  • 1 blue - 2 blue - 3 yellow and 4 yellow 
  • 1 blue - 2 blue - 3 orange 
The path to be followed (aka. user flow interaction with the target web system) depends on the cache device internal decision process. Cache device internal decision process, simplistically speaking is the the cache device algorithm used to make decisions on what content would be served, and the part we would be interested in hacking or subverting.     

Cache manipulation

The following diagram demonstrates how someone can potentially manipulate the web cache to extract sensitive information:

The legitimate user in Step 1 interacts with the web cache system (aka. the web server and the front end web cache system) and submit/retrieve sensitive content (which should not be cached in the first place). The hacker assesses the rules the cache server is using to store local user content (e.g. identify through experimentation which URL paths are being stored in the cache server etc.) copies and start retrieving sensitive information.

Web caching is a core design feature of the HTTP protocol meant to minimize network traffic while improving the perceived responsiveness of the system as a whole. Caches can be found at every level of a content's journey from the original server to the browser. [6]

Web caching works by caching the HTTP responses for requests according to certain rules. Subsequent requests for cached content can then be fulfilled from a cache closer to the user.

What usually is cached?

Certain content lends itself more readily to caching than others. Some very cache-friendly content for most sites are:
  • Logos and brand images
  • Non-rotating images in general (navigation icons, for example)
  • Style sheets
  • General Javascript files
  • Downloadable Content
  • Media Files
  • HTML pages
  • Rotating images
  • Frequently modified Javascript and CSS
  • Content requested with authentication cookies[6]
Putting things in perspective

In order to understand the importance/complexity of the attack it is better to elaborate that high traffic systems (e.g. media content servers etc.)  use multiple cache servers. Usually these type of systems assign web cache servers to whole regions (e.g. USA Region cache, EU Region cache etc.). These regions might be whole countries or even continents. Therefore  the significance of the impact depends on the following two factors:
  • The scope of the vulnerable cache servers
  • The content exposed through the cache servers
The following diagram demonstrate the issue:

The following diagram demonstrates a complicated infrastructure on cache management:

Note: In order for an attacker to attack the system she would have to assess the set of the rules of all the intermediate cache proxies.

Web cache criteria 

Web cache is achieved through the the "web cache keys". A web cache key is an identifier of a resource located on the web server. As a study case we will refer to the Akamai community posts to see how web cache keys as configured.

The following section is community post describing the concept of the Akamai Cache Key. This information is deduced from several Akamai configuration settings posted in the past. Issues discussed are:
  • How does the Edge Server knows which File needs to be cached?
  • How does the Edge Server retrieve the cached object from the “Cache Store”?
Note1: Content is cached on the so called “Cache Store”. The “Cache Store” does represent either the Memory (RAM) or Hard disk of a certain Edge Server.

Note2: An Akamai Edge server, is a cache server delivering content. To retrieve an object from the Akamai Platform, users must connect to an Akamai Edge server first. The server must apply a set of rules to the request, and then either locate the object in its cache or retrieve it from the origin. [12]

Note3: Also see sources [9] and [10].

The following diagram demonstrates a simple topology of an Akamai network:

To store an object on the Edge Server “Cache Store” we need to create the “Cache Key” first. The EdgeSuite Configuration Guide does mention that the Akamai Edge Server forms the “Cache Key”  based on parts of the "Request ARL".[11]

The ARL (Akamai Resource Location) is similar to an URL.The primary function of an ARL is to direct an end user’s request for an object to the Akamai network [13]. The ARL also contains the object’s caching properties.. The difference being that the ARL is specifically defined for objects to be served via the Akamai Network. There are two types of ARLs:
  1. ARL v1: This is the original ARL used in the earlier days of Akamai. It contains instructions for the Edge Server coded into its structure
  2. ARL v2: Instead of coding all instruction into the URL like done for ARL v1, ARL v2 does reference a Configuration File hosted on the Edge Server.
ARL Components which form the Cache Key:
  • Typecode
  • Forward [fwd] path (origin server, pathname, filename and extension)
  • Query string (Optional)
  • Secure Network Delivery Indicator
  • HTTP Method (GET, HEAD, etc.)
Note: The following description count mainly for ARL v2, we are not going to elaborate on ARL v1 as this are not used that often nowadays.

The following diagram breaks the ARL format:

The following section demonstrates the web cache keys using sample HTTP requests:


GET /products.jsp?productId=1 HTTP/1.1host: shop.edgegate.deUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0Accept: */*Pragma: akamai-x-get-cache-key


HTTP/1.1 200 OKContent-Type: text/html; charset=iso-8859-1Server: Google FrontendCache-Control: private, max-age=0Expires: Thu, 17 Dec 2015 00:00:06 GMTDate: Thu, 17 Dec 2015 00:00:06 GMTContent-Length: 1127X-Cache-Key: /L/1168/78685/1m/edgegatecpinotossi.appspot.com/products.jsp?productId=1Connection: keep-alive
Note: The text marked in red designate the web cache key. 

The following table explains the values used as web cache keys:

fwd Pathedgegatecpinotossi.appspot.com/products.jsp
Query String?productId=1

Before the attack: Reconnaissance

Before progressing with any type of cache manipulation it does worth the trouble to review the route path the targeted web server. Running a query on Robtext on google.com will give us a lot of information that can be used to see if a cache proxy is used.

Below you can see en extract of the output in Robtext (https://www.robtex.com/dns-lookup/google.com#owhois):

Note: Using also other manual tools to see if there is a cache proxy in front of the webservice.

Finally the attack: Web Cache deception

Web cache deception occurs when the target website is configured to be "flexible" about what kinds of paths it can handle (aka. URL(s)). For more information on what a URL is see https://www.rfc-editor.org/info/rfc1738 . This make sense from usability perspective e.g. by the product being tolerant on certain types of inputs becomes more user friendly. Also this has to do how each software vendor interprets the RFC related to the URL structure.

In particular, the issue arises when requests to a path that doesn't exist (say /x/y/z) are treated as equivalent to requests to a parent path that does exist (say /x). For example, what happens if you get a request for the nonexistent path /newsfeed/foo? Depending on how your website is configured, it might just treat such a request as equivalent to a request to /newsfeed. For example, if you're running the Django web framework, the following configuration would do just that because the regular expression ^newsfeed/ matches both newsfeed/ and newsfeed/foo (Django routes omit the leading /): [14]

from django.conf.urls import url
patterns = [url(r'^newsfeed/', ...)]
And here's where the problem lies. If your website does this, then a request to /newsfeed/foo.jpg will be treated as the same as a request to /newsfeed. But a web cache, seeing the .jpg file extension, will think that it's OK to cache this request. Because usually most of the web caches proxies by default store image file extensions. [14]

Below we can see a schematic analysis of the issue:

Note: In the following diagram above we can see the how a malicious user can request the home page of the user. At this point is assumed that the home page contains sensitive information and requires some kind of login. In this example is also assumed that the cache server stores local copies of the site images.

It does also worth saying that this is a simplified, and that is someone would like to perform a more complicated attack would have to:
  • Understand the scope of the cache server e.g. region cache server.
  • Understand the cache rules of the cache server e.g. Akamai ARL etc.
  • Identify target content of interest e.g. sensitive content etc.   
Note: It does also worth mentioning that identifying how both the web and cache server "understand" the URL structure is also important e.g. experimenting with malicious paths, such as mangled back slashes etc. This also relates to what is considered acceptable also from the browsers.

Finally the attack: Web Cache poisoning 

The objective of web cache poisoning is to send a request that causes a harmful response that gets saved in the cache and served to other users. The following diagram shows the process to follow:

James Kettle (aka. @albinowax) has done an amazing job documenting the vulnerability and wrote about multiple scenarios and ways to exploit the specific vulnerability. More specifically described the following scenarios:
  • Selective Poisoning
  • DOM Poisoning
  • Hijacking Mozilla SHIELD
  • Route poisoning
  • Hidden Route Poisoning
  • Chaining Unkeyed Inputs
  • Open Graph Hijacking
  • Local Route Poisoning
  • Internal Cache Poisoning
  • Drupal Open Redirect
  • Persistent redirect hijacking
  • Nested cache poisoning
  • Cross-Cloud Poisoning
A simplified version of an attack scenario would be to:

a simple example of Web Cache poisoning would that assuming that the cache key is the X-Forwarded-Host HTTP header. we can Inject our own variable and then echoed it back in a cache level.

This is taken from https://portswigger.net/blog/practical-web-cache-poisoning :


GET /en?cb=1 HTTP/1.1
Host: www.redhat.com
X-Forwarded-Host: canary


GET /en?cb=1 HTTP/1.1
Host: www.redhat.com
X-Forwarded-Host: canary

HTTP/1.1 200 OK
Cache-Control: public, no-cache

<meta property="og:image" content="https://canary/cms/social.png" />

In the example above we saw that the cache key was echoed back in the html body. The X-Forwarded-Host header has been used by the application to generate an Open Graph URL inside a meta tag. In this scenario we can assume that this can be converted into an XSS, HTML or other type of client side injection attack.

Defending against Web Cache attacks

The best way to defend against this attack is to ensure that your website isn't so permissive, and never treats requests to nonexistent paths. Also that:
  • Use the same URL to refer to the same items: Since caches key off of both the host and the path to the content requested, ensure that you refer to your content in the same way on all of your pages. The previous recommendation makes this significantly easier. [6]
  • Fingerprint cache items: For static content like CSS and Javascript files, it may be appropriate to fingerprint each item (per user session). This means adding a unique identifier to the filename (often a hash of the file) so that if the resource is modified, the new resource name can be requested, causing the requests to correctly bypass the cache. [6]
  • Write your custom cache rules: A web cache server has to be aware of the application content and nature e.g. not caching dynamic content on banking application etc.    
  • Avoid taking input from headers and cookie: Simply filter HTTP headers and cookies by running integrity checks.
  • Disable cache if not required: Lots of services don't require caching, but because is enabled by default the allow it.
Tools for cache poisoning/deception 

The following section demonstrates tools that can be used to manipulate cache poisoning: 
  • param-miner: This extension identifies hidden, unlinked parameters. It's particularly useful for finding web cache poisoning vulnerabilities.[3]
  • Burp Suite Free/Pro: Intruder component [16]


Hacker’s Elusive Thoughts The Web


The reason for this blog post is to advertise my book. First of all I would like to thank all the readers of my blog for the support and feedback on making my articles better. After 12+ years in the penetration testing industry, the time has come for me to publish my book and tranfer my knowledge to all the intersted people that like hacking and want to learn as much as possible. Also at the end of the blog you will find a sample chapter.

About The Author

Gerasimos is a security consultant holding a MSc in Information Security, a CREST (CRT), a CISSP, an ITILv3, a GIAC GPEN and a GIAC GAWPT accreditation. Working alongside diverse and highly skilled teams Gerasi- mos has been involved in countless comprehensive security tests and web application secure development engagements for global web applications and network platforms, counting more than 14 years in the web application and application security architecture.

Gerasimos further progressing in his career has participated in vari- ous projects providing leadership and accountability for assigned IT security projects, security assurance activities, technical security reviews and assess- ments and conducted validations and technical security testing against pre- production systems as part of overall validations.

Where From You Can Buy The Book

This book can be bought from leanbup. Leanpub is a unique publishing platform that provides a way in the world to write, publish and sell in-progress and completed ebooks. Anyone can sign up for free and use Leanpub's writing and publishing tools to produce a book and put it up for sale in our bookstore with one click. Authors are paid a royalty of 90% minus 50 cents per transaction with no constraints: they own their work and can sell it elsewhere for any price.

Authors and publishers can also upload books they have created using their own preferred book production processes and then sell them in the Leanpub bookstore, taking advantage of our high royalty rates and our in-progress publishing features.

Please for more information about bying the book see link: https://leanpub.com/hackerselusivethoughtstheweb

Why I Wrote This Book

I wrote this book to share my knowledge with anyone that wants to learn about Web Application security, understand how to formalize a Web Appli- cation penetration test and build a Web Application penetration test team.

The main goal of the book is to: 

Brainstorm you with some interesting ideas and help you build a com- prehensive penetration testing framework, which you can easily use for your specific needs. Help you understand why you need to write your own tools. Gain a better understanding of some not so well documented attack techniques.
The main goal of the book is not to:
Provide you with a tool kit to perform Web Application penetration tests. Provide you with complex attacks that you will not be able to under- stand. Provide you with up to date information on latest attacks.

Who This Book Is For 

This book is written to help hacking enthusiasts to become better and stan- dardize their hacking methodologies and techniques so as to know clearly what to do and why when testing Web Applications. This book will also be very helpful to the following professionals:

1. Web Application developers.
2. Professional Penetration Testers.
3. Web Application Security Analysts.
4. Information Security professionals.
5. Hiring Application Security Managers.
6. Managing Information Security Consultants.

How This Book Is Organised  

Almost all chapters are written in such a way so as to not require you to read the chapters sequentially, in order to understand the concepts presented, although it is recommended to do so. The following section is going to give you an overview of the book:

Chapter 1: Formalising Web Application Penetration Tests -
This chapter is a gentle introduction to the world of penetration testing, and attempt to give a realistic view on the current landscape. More specifically it attempt to provide you information on how to compose a Pen- etration Testing team and make the team as ecient as possible and why writing tools and choosing the proper tools is important.

Chapter 2: Scanning With Class -

The second chapter focuses on helping you understand the dierence between automated and manual scanning from the tester’s perspective. It will show you how to write custom scanning tools with the use of Python. This part of the book also contains Python chunks of code demonstrating on how to write tools and design your own scanner.

Chapter 3: Payload Management -

This chapter focuses on explaining two things a) What is a Web payload from security perspective, b) Why is it important to obfuscated your payloads.

Chapter 4: Infiltrating Corporate Networks Using XXE -

This chapter focuses on explaining how to exploit and elevate an External Entity (XXE) Injection vulnerability. The main purpose of this chapter is not to show you how to exploit an XXE vulnerability, but to broaden your mind on how you can combine multiple vulnerabilities together to infiltrate your target using an XXE vulnerability as an example.

Chapter 5: Phishing Like A Boss -

This chapter focuses on explaining how to perform phishing attacks using social engineering and Web vulnerabilities. The main purpose of this chapter is to help you broaden your mind on how to combine multiple security issues, to perform phishing attacks.

Chapter 6: SQL Injection Fuzzing For Fun And Profit -

This chapter focuses on explaining how to perform and automate SQL injection attacks through obfuscation using Python. It also explains why SQL injection attacks happen and what is the risk of having them in your web applications.

Sample Chapter Download
From the following link you will be able to download a sample chapter from my book:

Sample Book Download