Bypassing Cloudflare: AI-Assisted WAF Fingerprinting and Why the Orange Shield Is a Filter, Not a Perimeter
Offensive recon • WAF evasion • April 2026
wafcloudflarereconllm-assistedred-team
A WAF is not a perimeter. Every time an engagement starts with a target proudly sitting behind Cloudflare and the suits asking if that "covers us," I have to bite my tongue. Cloudflare is a filter. It inspects traffic that routes through it. If you can talk to the origin directly, or if you can make your traffic look indistinguishable from a real Chrome 124 on Windows 11, the filter never fires.
This post is about the two halves of that bypass in 2026: origin discovery (so you can skip the WAF entirely) and fingerprint cloning (so that when you cannot skip it, you blend in). And because everyone wants to know where the LLMs plug in, I will tell you exactly where they actually earn their keep — and where they are a distraction.
How Cloudflare Actually Identifies You
Cloudflare's detection stack has layers. Understanding them is the whole engagement:
- IP reputation and ASN. Datacenter ranges, known VPN exits, and Tor exits start the request with a negative score.
- TLS fingerprinting. JA3, JA4, and JA4+ hash your Client Hello: cipher suite order, supported groups, extension order, ALPN, signature algorithms. Python
requestshas a fingerprint.curlhas a fingerprint. Chrome 124 on Windows has a fingerprint. Cloudflare knows all of them. - HTTP/2 fingerprinting. Frame order, SETTINGS values, HEADERS pseudo-header ordering. Akamai has been using this since 2020; Cloudflare followed.
- Header entropy and consistency. If your User-Agent claims Chrome but you sent
Accept-LanguagebeforeAccept-Encodingin a non-Chrome order, that is a tell. If you sentSec-CH-UA-Full-Version-Listwith a Firefox UA, Firefox does not ship that header. - Canvas, WebGL, and JS challenges. The managed challenge and the JS challenge execute code in the browser and return a signed token. Headless leaks (
navigator.webdriver, missing plugin arrays, headless Chrome string in UA) get caught here. - Behavioral. Mouse entropy, scroll patterns, time-to-interaction. This is the slowest layer but the hardest to fake.
Origin IP Discovery: The Actual Win
Most real engagements end here. You do not bypass Cloudflare, you route around it. The October 2025 /.well-known/acme-challenge/ zero-day was fun, but the long-term winners are the same techniques that have worked since 2018 and still do in 2026:
Passive DNS and certificate transparency
# Historical DNS records
curl -s "https://api.securitytrails.com/v1/history/${DOMAIN}/dns/a" \
-H "APIKEY: $ST_API"
# Certificate transparency logs — catch the cert before CF fronted it
curl -s "https://crt.sh/?q=%25.${DOMAIN}&output=json" \
| jq -r '.[].common_name' | sort -u
# Censys: find hosts serving the target cert SHA256
censys search "services.tls.certificates.leaf_data.fingerprint: ${CERT_SHA256}"
Half the time the origin is in a cloud subnet that still serves the cert directly. Validate with a Host header override:
curl -vk --resolve ${TARGET}:443:${CANDIDATE_IP} \
"https://${TARGET}/" -H "User-Agent: Mozilla/5.0"
If the response matches what you see through Cloudflare and there is no cf-ray header, you are at the origin.
The usual suspects for IP leakage
- Mail servers.
dig mx, then check SPF TXT records. Companies front their web through CF but send mail from the origin network. - Subdomain sprawl.
dev.,staging.,old.,direct.,origin.— often not proxied.amass,subfinder, andcrtfinderremain the workhorses. - Favicon hash pivot. Get the favicon SHA from the CF-fronted site, search Shodan with
http.favicon.hash:${MURMUR3}. - Misconfigured DNS providers. Free-tier DNS accidentally exposing A records the customer thought were internal.
- Webhooks, error reports, XML-RPC. Anywhere the app itself reaches out to the internet and leaks an IP header.
Where AI Actually Helps
Most LLM-assisted WAF bypass content online is nonsense. Throwing "generate an XSS that evades Cloudflare" at a frontier model yields the same stale payloads that got baked into the managed ruleset in 2023. The model has no feedback loop with the target, so it is guessing against the ruleset it saw in training data.
Where an LLM genuinely helps:
1. Header set generation for fingerprint cloning
Given a captured browser request, an LLM can produce a set of header permutations that preserve the UA's semantic coherence (no conflicting client hints, correct header ordering for that browser family) faster than you can script the rules. I use it to generate the consistency constraints, then feed those into curl-impersonate or a custom HTTP/2 client. The model does not send traffic; it produces the permutation space.
2. WAF rule reverse engineering from responses
Send 500 mutated payloads, capture the responses (block page, 403, 429, pass), feed the (payload, response) pairs to the model, ask it to hypothesize what substrings are being matched. It is significantly better than regex-mining by hand. Treat its hypotheses as leads, not conclusions.
3. Sqlmap tamper script synthesis
Give the model a target parameter, a block message, and a working-in-isolation payload, ask for a tamper chain. This is what nowafpls and friends do deterministically; the model just makes the chain wider.
What the model does not do is bypass the JS challenge. It cannot run v8 in its head. Every serious bypass in 2026 still goes through curl-impersonate, Camoufox, SeleniumBase with undetected-chromedriver, or a fortified Playwright build. The LLM is a combinatorics engine around those tools.
Putting It Together: A Clean Bypass Flow
#!/bin/bash
# Stage 1: origin discovery
subfinder -d $TARGET -silent | httpx -silent -tech-detect \
| grep -v "Cloudflare" > non_cf_subs.txt
# Stage 2: cert pivot
cert_sha=$(echo | openssl s_client -connect $TARGET:443 2>/dev/null \
| openssl x509 -fingerprint -sha256 -noout | cut -d= -f2 | tr -d :)
censys search "services.tls.certificates.leaf_data.fingerprint:${cert_sha}" \
> origin_candidates.txt
# Stage 3: validate
while read ip; do
code=$(curl -sk --resolve $TARGET:443:$ip \
-o /dev/null -w "%{http_code}" "https://$TARGET/" \
-H "User-Agent: Mozilla/5.0")
[ "$code" = "200" ] && echo "ORIGIN: $ip"
done < origin_candidates.txt
# Stage 4: if no origin, fall through to fingerprint cloning
curl-impersonate-chrome -s "https://$TARGET/" --compressed
Defender Notes
If you are on the blue side, the mitigations are not new but they are still not deployed at most of the orgs I see:
- Authenticated Origin Pulls (mTLS). The origin only accepts connections presenting a Cloudflare-signed client cert. Every "I found the origin IP" report dies here.
- Cloudflare Tunnel. No public origin IP at all.
- Firewall the origin to Cloudflare IP ranges only. The absolute minimum. Rotate the origin IP after onboarding so historical DNS records do not leak it.
- Disable non-standard ports on the origin. Cloudflare WAF rule 8e361ee4328f4a3caf6caf3e664ed6fe blocks non-80/443 at the edge; the origin should not even listen.
- Header secret. Require a custom header containing a pre-shared secret set as a Cloudflare Transform Rule. Stops the attacker-owned-Cloudflare-account bypass.
Closing
The WAF industry wants you to believe that a slider in a dashboard is security. It is not. It is a filter in front of the thing that is actually exposed. If you do not harden the origin with mTLS and IP allow-lists, you have an orange proxy and a footgun in the same shape.
And if you are reading this as a defender: the next time a penetration test report comes back clean because "everything is behind Cloudflare," send it back. Ask for a retest that assumes origin disclosure. That is the test you actually wanted.
elusive thoughts • securityhorror.blogspot.com