Showing posts with label AI Security. Show all posts
Showing posts with label AI Security. Show all posts

10/04/2026

AI Vulnerability Research Goes Mainstream: The End of Attention Scarcity

The security industry just hit an inflection point, and most people haven't noticed yet.

For decades, vulnerability research was a craft. You needed deep expertise in memory layouts, compiler internals, protocol specifications, and the patience to trace inputs through code paths that no sane person would willingly read. The barrier to entry wasn't just skill — it was attention. Elite researchers could only focus on so many targets. Everything else got a free pass by obscurity.

That free pass just expired.

The Evidence Is In

In February 2026, Anthropic's Frontier Red Team published results from pointing Claude Opus 4.6 at well-tested open source codebases — projects with millions of hours of fuzzer CPU time behind them. The model found over 500 validated high-severity vulnerabilities. Some had been sitting undetected for decades.

No custom tooling. No specialised harnesses. No domain-specific prompting. Just a frontier model, a virtual machine with standard developer tools, and a prompt that amounted to: find me bugs.

Thomas Ptacek, writing in his now-viral essay "Vulnerability Research Is Cooked", summarised it bluntly:

You can't design a better problem for an LLM agent than exploitation research. Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code.

And Nicholas Carlini — the Anthropic researcher behind the findings — demonstrated that the process is almost embarrassingly simple. Loop over source files in a repository. Prompt the model to find exploitable vulnerabilities in each one. Feed the reports back through for verification. The success rate on that pipeline: almost 100%.

Why LLMs Are Uniquely Good at This

Traditional vulnerability discovery tools — fuzzers, static analysers, symbolic execution engines — are powerful but fundamentally limited. Fuzzers throw random inputs at code and wait for crashes. Coverage-guided fuzzers do it smarter, but they still can't reason about what they're looking at.

LLMs can. And the reasons are structural:

Capability Traditional Tools LLM Agents
Bug class knowledge Encoded in rules/signatures Internalised from training corpus
Cross-component reasoning Limited to call graphs Semantic understanding of interactions
Patch gap analysis Not possible Reads git history, finds incomplete fixes
Algorithm-level understanding None Can reason about LZW, YAML parsing, etc.
Fatigue Infinite runtime, no reasoning Infinite runtime with reasoning

The Anthropic results illustrate this perfectly. In one case, Claude found a vulnerability in GhostScript by reading the git commit history — spotting a security fix, then searching for other code paths where the same fix hadn't been applied. No fuzzer does that. In another, it exploited a subtle assumption in the CGIF library about LZW compression ratios, requiring conceptual understanding of the algorithm to craft a proof-of-concept. Coverage-guided fuzzing wouldn't catch it even with 100% branch coverage.

The Attention Scarcity Model Is Dead

Here's the part that should keep you up at night.

The entire security posture of the modern internet has been load-bearing on a single assumption: there aren't enough skilled researchers to look at everything. Chrome gets attention because it's a high-value target. Your hospital's PACS server doesn't, because nobody with elite skills cares enough to audit it.

As Ptacek puts it:

In a post-attention-scarcity world, successful exploit developers won't carefully pick where to aim. They'll just aim at everything. Operating systems. Databases. Routers. Printers. The inexplicably networked components of my dishwasher.

The cost of elite-level vulnerability research just dropped from "hire a team of specialists for six months" to "spin up 100 agent instances overnight." And unlike human researchers, agents don't need Vyvanse, don't get bored, and don't demand stock options.

What Wordfence Is Seeing

This isn't theoretical anymore. Wordfence reported in April 2026 that AI-assisted vulnerability research is now producing meaningful results in the WordPress ecosystem — one of the largest and most target-rich attack surfaces on the web. Researchers are using frontier models to audit plugins and themes at a pace that was previously impossible.

The WordPress ecosystem is a perfect canary for what's coming everywhere else. Thousands of plugins, maintained by small teams or solo developers, many with no dedicated security review process. The same pattern applies to npm packages, PyPI libraries, and every other open source ecosystem.

The Defender's Dilemma

The optimistic reading is that defenders can use these same capabilities. Anthropic is already contributing patches to open source projects. Bruce Schneier noted the trajectory in February. The ZeroDayBench paper is building standardised benchmarks for measuring agent capabilities in this space.

But here's the asymmetry that matters: defenders need to find and fix every bug. Attackers only need one.

And the operational challenges are stacking up:

  • Report volume: Open source maintainers were already drowning in AI-generated slop reports. Now they'll face a steady stream of valid high-severity findings. The 90-day disclosure window may not survive this.
  • Patch velocity: Finding bugs is now faster than fixing them. Many critical targets — routers, medical devices, industrial control systems — require physical access to patch.
  • Regulatory risk: Legislators who don't understand the nuance of dual-use security research may respond to the inevitable wave of AI-discovered exploits with incoherent regulation that disproportionately hamstrings defenders.
  • Closed source is no longer a defence: LLMs can reason from decompiled code and assembly as effectively as source. Security through obscurity was always weak — now it's nonexistent.

What This Means for Security Teams

If you're running a security programme in 2026, here's the reality check:

  1. Assume your code will be audited by AI. Not "might be" — will be. Every open source dependency you use, every API endpoint you expose, every parser you've written. Act accordingly.
  2. Integrate AI into your own security testing. If you're still relying solely on annual pentests and quarterly SAST scans, you're operating on 2023 assumptions in a 2026 threat landscape.
  3. Invest in patch velocity. The bottleneck has shifted from finding bugs to fixing them. Your mean-time-to-remediate just became your most critical security metric.
  4. Watch the regulation space. The political response to AI-discovered vulnerabilities will matter as much as the technical response. Get involved in the policy conversation before the suits write rules that make defensive research illegal.
  5. Memory safety isn't optional anymore. The migration to Rust, Go, and other memory-safe languages was already important. With AI agents capable of finding every remaining memory corruption bug in your C/C++ codebase, it's now existential.

The Bottom Line

We're witnessing a phase transition in offensive security. The craft of vulnerability research — built over three decades of accumulated expertise, tribal knowledge, and hard-won intuition — is being commoditised in real time. The models aren't replacing the top 1% of researchers (yet). But they're replacing the other 99% of the work, and that 99% is where most real-world exploits come from.

The boring bugs. The overlooked code paths. The parsers nobody audited because they weren't glamorous enough. That's where the next wave of breaches will originate — and AI agents are already finding them faster than humans can patch them.

The question isn't whether AI will transform vulnerability research. It already has. The question is whether defenders can scale their response fast enough to keep up.

Based on what I'm seeing? It's going to be close.


Sources:

06/04/2026

How CLI Automation Becomes an Exploitation Surface

How CLI Automation Becomes an Exploitation Surface

Securing Skill Templates Against Malicious Inputs

There’s a familiar lie in engineering: it’s just a wrapper. Just a thin layer over a shell command. Just a convenience script. Just a little skill template that saves time.

That lie ages badly.

The moment a CLI tool starts accepting dynamic input from prompts, templates, files, issue text, documentation, emails, or model-generated content, it stops being “just a wrapper” and becomes an exploitation surface. Same shell. Same filesystem. Same credentials. New attack path.

This is where teams get sloppy. They see automation and assume efficiency. Attackers see trust transitivity and start sharpening knives.

The Real Problem Isn’t the CLI

The shell is not new. Unsafe composition is.

Most modern automation stacks don’t fail because Bash suddenly became more dangerous. They fail because developers bolt natural language, templates, or tool-chaining onto CLIs without rethinking trust boundaries.

Typical failure pattern:

  • untrusted input enters a template
  • the template becomes a command, argument list, config file, or follow-up instruction
  • the downstream CLI executes it with local privileges
  • everyone acts surprised when the blast radius includes tokens, source code, mailboxes, build agents, or production infra

That’s not innovation. That’s command injection wearing a startup hoodie.

Where Skill Templates Go Rotten

Skill templates are especially risky because they look structured. People assume structure means safety. It doesn’t.

A template can become dangerous when it interpolates:

  • shell fragments
  • filenames and paths
  • environment variables
  • markdown or HTML pulled from external sources
  • model output
  • repo-controlled metadata
  • ticket text
  • email content
  • generated “fix” commands

The exploit doesn’t need to look like raw shell metacharacters either. Sometimes the payload is more subtle:

  • extra flags that alter command behavior
  • path traversal into sensitive files
  • output poisoning that changes downstream steps
  • hostile content designed to influence an LLM operator
  • malformed config that flips a benign action into a destructive one

The attack surface grows fast when one template feeds another system that assumes the first one already validated things.

That assumption gets people wrecked.

The New Indirect Input Problem

The most interesting attacks won’t come from a user typing rm -rf /.

They’ll come from content the system was trained to trust.

A repo README.
A changelog.
A copied stack trace.
An issue comment.
A pasted email.
A support ticket.
A generated summary.
A model-produced remediation step.

Once your CLI pipeline starts consuming semi-trusted text from upstream sources, indirect influence becomes the game. The attacker no longer needs direct shell access. They just need to place hostile content somewhere your workflow ingests it.

That is the part too many AI-assisted CLI workflows still don’t understand.

Why LLMs Make This Worse

LLMs don’t introduce shell injection from scratch. They industrialize bad judgment around it.

They normalize three dangerous behaviors:

  1. trusting generated commands because they sound competent
  2. flattening trust boundaries between user intent and executable output
  3. encouraging automation pipelines to consume text that was never safe to execute

A model can turn ambiguity into action far too quickly. It can also produce commands, file edits, or workflow suggestions with just enough confidence to bypass human skepticism.

That turns review into theater.

If a human is approving commands they don’t fully parse because the assistant “usually gets it right,” the system is already compromised in spirit, even before it is compromised in practice.

Common Design Mistakes

Here’s the usual pile of bad decisions:

1. Raw string interpolation into shell commands

If your template builds commands with string concatenation, you are already in the danger zone.

2. Treating model output as trusted intent

Model output is untrusted text. Full stop.

3. Letting repo content steer execution

If documentation, issue text, or config comments can influence command generation, you need to model that as an adversarial input path.

4. Inheriting excessive privileges

If the tool can access secrets, SSH keys, mailboxes, or production contexts, the blast radius becomes unacceptable fast.

5. Chaining tools without preserving trust metadata

When one tool’s output becomes another tool’s instruction set, you need taint awareness. Most stacks don’t have it.

6. Approval gates that review strings instead of semantics

Humans are bad at spotting danger in dense command lines, especially under time pressure.

Defensive Design That Actually Helps

Now the useful part.

Use structured argument passing

Do not compose raw shell commands unless you absolutely have to. Prefer direct process execution with separated arguments.

Bad:

tool "$USER_INPUT"

Worse:

sh -c "tool $USER_INPUT"

Safer design means avoiding shell interpretation entirely whenever possible.

Treat model output as hostile until validated

If an LLM suggests a command, file path, or remediation step, validate it against policy before execution. Don’t confuse articulate output with trustworthy output.

Lock templates to explicit allowlists

If a template only needs three safe flags, allow three safe flags. Not “anything that looks reasonable.”

Preserve taint boundaries

Track whether content came from:

  • user input
  • external files
  • repo content
  • model output
  • network sources

If you lose provenance, you lose control.

Sandbox like you mean it

A sandbox is only useful if it meaningfully restricts:

  • filesystem scope
  • network egress
  • credential access
  • host escape paths
  • high-risk binaries

A fake sandbox is just delayed regret.

Design approval as policy, not vibes

Don’t ask humans to bless giant strings. Ask systems to enforce rules:

  • block dangerous binaries
  • require confirmation for write/delete/network actions
  • restrict sensitive paths
  • forbid chained shells unless explicitly approved

Minimize inherited secrets

If your CLI workflow doesn’t need cloud creds, don’t give it cloud creds. Same for mail access, SSH agents, API tokens, and browser sessions.

Least privilege still works. Shocking, I know.

A Better Mental Model

Stop thinking of CLI automation as a helper.

Think of it as a junior operator with:

  • partial understanding
  • variable reliability
  • access to tooling
  • exposure to hostile content
  • no native sense of trust boundaries unless you build them in

That framing makes the security work obvious.

Would you let an eager junior SRE run commands copied from issue comments, emails, and AI summaries directly on systems with production credentials?

If not, stop letting your automation do it.

Final Thought

The next wave of exploitation won’t always target the shell directly. It will target the systems that prepare, enrich, template, summarize, and bless what reaches the shell.

That’s the real story.

CLI tooling didn’t become dangerous because it got more powerful. It became dangerous because people surrounded it with layers that convert untrusted text into trusted action.

Same old mistake. New suit.

04/04/2026

Browser-Use Agents and Server-Side Request Forgery: Old Vulns, New Vectors

Browser-Use Agents and Server-Side Request Forgery: Old Vulns, New Vectors

SSRF is not new. It’s been on the OWASP Top 10 since 2021, it’s been in every pentester’s playbook for a decade, and it’s the reason you’re not supposed to let user input control outbound HTTP requests from your server. We know how to prevent it. We know how to test for it. We’ve written the cheat sheets, the detection rules, the WAF signatures.

And then we gave AI agents a browser and told them to “go look things up.”

SSRF is back, and this time it’s wearing a trench coat made of natural language.

The Old SSRF: A Quick Refresher

Classic SSRF is straightforward: an application takes a URL from user input and makes a server-side request to it. The attacker supplies http://169.254.169.254/latest/meta-data/ instead of a legitimate URL. The server dutifully fetches AWS credentials from the instance metadata service and hands them to the attacker. Game over.

Defences are well-understood: validate URLs against allowlists, block private IP ranges, resolve DNS before making the request to prevent rebinding, restrict egress at the network level. This is AppSec 101.

But those defences assumed something: that URLs would arrive as URLs, in URL-shaped fields, through parseable HTTP parameters.

That assumption no longer holds.

The New Vector: AI Agents as SSRF Proxies

An AI agent with browsing capabilities is, architecturally, an SSRF vulnerability by design. Its entire purpose is to receive instructions in natural language and make HTTP requests to arbitrary destinations. The “user input” isn’t a URL parameter — it’s a sentence like “check the internal admin dashboard” or “fetch this document for me.”

The agent dutifully translates that into an HTTP request. And if nobody told it that http://localhost:8080/admin is off-limits, it will happily go there.

This isn’t theoretical. Let me walk you through what’s already happening.

Real-World Evidence: It’s Already Being Exploited

1. Pydantic AI — CVE-2026-25580 (CVSS 8.6)

In February 2026, Pydantic AI — a widely-used framework for building AI agents — disclosed CVE-2026-25580, a textbook SSRF vulnerability in its URL download functionality. The download_item() helper fetched content from URLs without validating that the target was a public address.

Any application accepting message history from untrusted sources (chat interfaces, Vercel AI SDK integrations, AG-UI protocol implementations) was vulnerable. An attacker could submit a message with a file attachment pointing at:

http://169.254.169.254/latest/meta-data/iam/security-credentials/

And the server would fetch AWS IAM credentials and return them. Multiple model integrations were affected — OpenAI, Anthropic, Google, xAI, Bedrock, and OpenRouter all had download paths that could be abused.

The fix? Comprehensive SSRF protection: blocking private IPs, always blocking cloud metadata endpoints, validating redirect targets, resolving DNS before requests. Standard SSRF defences that should have been there from day one. The fact that a framework built specifically for AI agents shipped without basic SSRF protection tells you everything about the current state of agent security.

2. Tencent Xuanwu Lab — Server-Side Browser Kill Chains

Tencent’s Xuanwu Lab published a white paper on AI web crawler security in February 2026 that reads like a horror story. They tested server-side browsers across multiple AI products and found remote code execution vulnerabilities in every single one. The affected products collectively serve over a billion users.

Their four documented attack cases expose a pattern:

Case Entry Point Bypass Method Impact
1AI search with URL allowlist302 redirect via allowlisted siteRCE, no sandbox
2AI reading + sharing + screenshotChained features to bypass domain allowlistSSRF to cloud metadata
3URL access with script filtering<img onerror> bypassed <script> filterRCE via N-day chain
4Hidden backend indexing crawlerNo bypass needed — no defencesRCE, no sandbox

Case 4 is particularly grim: a hidden backend crawler that batch-fetched URLs users had queried — invisible to frontend security, undocumented, running an outdated browser with no sandbox. The attacker didn’t even need to bypass anything.

The Xuanwu team puts it bluntly: “When you launch a browser instance, you are not starting a simple web browsing tool — you are launching a ‘micro operating system.’ A vulnerability in any single component could lead to remote code execution.”

3. Unit 42 — Indirect Prompt Injection as SSRF Delivery Mechanism

Palo Alto’s Unit 42 published research in March 2026 documenting web-based indirect prompt injection (IDPI) attacks observed in the wild. Not proof-of-concept. Not lab demos. Production attacks.

Their taxonomy maps the full kill chain from SSRF’s perspective:

  • Forced internal requests: Embedded prompts in web pages instructing agents to access http://localhost, internal services, and cloud metadata endpoints
  • Unauthorized transactions: Prompts directing agents to visit Stripe payment URLs and PayPal links to initiate financial transactions
  • Data exfiltration: Instructions to collect environment variables, credentials, and contact lists — then exfiltrate via URL-encoded requests
  • Data destruction: Commands to rm -rf and fork bombs targeting backend infrastructure

The delivery methods are creative: zero-width Unicode characters, CSS-hidden text, Base64-encoded payloads assembled at runtime, SVG encapsulation, HTML attribute cloaking. 85% of the jailbreaks were social engineering — framing destructive commands as “security updates” or “compliance checks.”

The kicker: one attacker embedded 24 separate prompt injection attempts in a single page, using different delivery methods for each one. If even one bypasses the model’s safety filters, the attack succeeds.

4. Browserbase — “One Malicious <div> Away From Going Rogue”

Browserbase’s February 2026 analysis frames the problem with precision: “Every webpage an agent visits is a potential vector for attack.” They cite the PromptArmor research on Google’s Antigravity IDE, where an indirect prompt injection hidden in 1-point font inside an “implementation guide” successfully exfiltrated environment variables by encoding them as URLs and sending them via the browser agent’s own network requests.

That’s SSRF triggered by reading a document. The URL didn’t arrive as a URL. It arrived as invisible text on a web page.

Why Traditional SSRF Defences Fail Against Agents

The fundamental problem: SSRF defences are designed to protect applications, not autonomous decision-makers.

Traditional Defence Why It Fails With Agents
URL allowlistsAgents generate URLs dynamically from natural language — no static list covers the infinite space of valid requests
Input validation on URL parametersThe “input” is a sentence, not a URL. The URL is constructed internally by the agent
WAF signaturesNatural language payloads don’t match traditional SSRF patterns
DNS pre-resolutionOnly works if you control the HTTP client — many agent frameworks use browsers that handle DNS independently
Egress filteringAgent needs internet access to function — blocking egress breaks the core use case
IP blocklistsOnly effective if applied at the HTTP client level before the request is made — agents using embedded browsers bypass application-layer controls

The Tencent Xuanwu research adds another dimension: even when enterprises implement URL allowlists, they’re trivially bypassed. A 302 redirect from an allowlisted domain to an attacker-controlled page defeats the entire scheme. The SSRF isn’t in the first request — it’s in the redirect chain that follows.

The Attack Surface Is Bigger Than You Think

SSRF in the context of browser-use agents isn’t just about fetching cloud metadata. The attack surface includes:

  • Cloud metadata services: AWS IMDSv1 (169.254.169.254), GCP, Azure, Alibaba Cloud — stealing IAM roles, service account tokens, API keys
  • Internal APIs and admin panels: Accessing unauthenticated internal services that trust requests from within the network perimeter
  • Database ports: Probing internal MySQL:3306, Redis:6379, PostgreSQL:5432 — extracting data from services that don’t require auth on localhost
  • Container orchestration: Accessing Kubernetes API servers, Docker sockets, etcd — pivoting to full cluster compromise
  • Other agents: In multi-agent architectures, a compromised agent can SSRF into other agents’ API endpoints, creating cascading compromise
  • Data exfiltration via URL encoding: The PromptArmor/Antigravity technique — embedding stolen data in outbound URL parameters, effectively using the agent as a covert channel

The Xuanwu team found that server-side browser containers were often deployed in the same network segment as production databases, task schedulers, and model inference nodes. Zero network isolation. Once the browser was compromised, lateral movement was trivial.

What Actually Works

If you’re deploying agents with browsing capabilities, here’s what you need — not principles, but concrete controls:

1. Network Isolation (Non-Negotiable)

Browser agents must run in isolated network zones. Egress to the internet: allowed. Access to internal services, metadata endpoints, private IP ranges: blocked at the infrastructure level. Kubernetes NetworkPolicies, separate VPCs, cloud security groups. This is the single most effective control — if the agent can’t reach 169.254.169.254, stealing metadata credentials is off the table regardless of what the LLM is tricked into doing.

2. SSRF Protection at the HTTP Client Level

Every HTTP request the agent makes should pass through a hardened client that:

  • Resolves DNS before connecting (prevents rebinding)
  • Blocks private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16)
  • Always blocks cloud metadata endpoints, even if “allow-local” is configured
  • Validates every redirect target, not just the initial URL
  • Restricts protocols to http:// and https:// only

Pydantic AI’s post-CVE fix is a good reference implementation.

3. Browser Sandboxing (Never Disable It)

The Tencent research found that multiple AI products disabled Chrome’s sandbox (--no-sandbox) to resolve container compatibility issues. This is catastrophic. Fix the container configuration instead: add the required seccomp profiles, grant CAP_SYS_ADMIN if necessary, configure user namespaces properly. The sandbox is the last line of defence against RCE — removing it turns every browser vulnerability into a full server compromise.

4. Instance Isolation

Each browsing task should use an independent, ephemeral browser instance that’s destroyed after completion. This prevents cross-task contamination, stops persistent compromise, and eliminates credential leakage between sessions. Browserbase’s approach of dedicated VMs per session with automatic teardown is the right model.

5. Attack Surface Reduction

Disable everything the agent doesn’t need: WebGL, WebRTC, PDF plugins, extensions. If performance allows, run with --jitless to eliminate the V8 JIT compiler — which accounts for roughly 23% of Chrome’s high-severity CVEs. Tencent’s analysis shows that disabling WebGL/GPU and JIT alone eliminates nearly 40% of browser vulnerability surface.

6. Runtime Behaviour Control

Tencent open-sourced SEChrome, a protection layer that monitors browser process system calls and enforces allowlists for file access, process execution, and network requests. Even if an attacker achieves RCE inside the browser, they can’t read sensitive files, execute arbitrary commands, or access the network beyond permitted destinations. Every tested exploit was blocked.

The Uncomfortable Truth

We’re deploying AI agents that have the browsing capabilities of a human user, the network access of a server-side application, and the security boundaries of neither. Every web page they visit is a potential attack payload. Every URL they construct is a potential SSRF. Every redirect they follow is a potential pivot point.

SSRF wasn’t “solved” in traditional web applications — it was managed through layers of controls that assumed a predictable request flow. AI agents break that assumption completely. The request flow is generated by a language model interpreting natural language from potentially hostile sources.

The good news: the defences exist. Network isolation, sandboxing, SSRF-hardened HTTP clients, instance isolation, runtime behaviour control. None of this is novel engineering. It’s applying established security patterns to a new deployment model.

The bad news: most agent deployments aren’t implementing any of it.

Old vulns don’t retire. They just find new hosts.


Sources & Further Reading:

GitHub Actions as an Attacker's Playground

GitHub Actions as an Attacker's Playground — 2026 Edition CI/CD security • Supply chain • April 2026 ci-cd github-actions supply-c...