Elusive Thoughts: Vulnerability Research

Showing posts with label Vulnerability Research. Show all posts

03/05/2026

Software Supply Chain Failures: The OWASP Category That Eats Everything

// ELUSIVE THOUGHTS — APPSEC / OWASP
Software Supply Chain Failures: The OWASP Category That Eats EverythingPosted by Jerry — May 2026
OWASP Top 10 2025 added Software Supply Chain Failures as a top-level category. The change reflects what every working application security professional has been seeing for two years: the supply chain is the dominant attack vector, and it is structurally distinct enough from "vulnerable components" to deserve its own category.
The numbers behind the elevation are not subtle.
Sonatype's State of the Software Supply Chain 2024 reported more than 700,000 malicious packages found across npm, PyPI, and Maven since 2019, with a 156 percent year-over-year jump. Indusface's State of Application Security 2026 reports 6.29 billion attacks targeting website vulnerabilities in 2025, up 56 percent year-over-year. The median time to weaponization of a disclosed vulnerability is now under five days. 54 percent of critical vulnerabilities face active exploitation within the first week of disclosure.
This is the category. This is what is happening.
// what makes supply chain different from "vulnerable components"The previous OWASP category — A06:2021 Vulnerable and Outdated Components — focused on the use of components with known vulnerabilities. The fix was conceptually clear: keep components updated, scan for known CVEs, replace deprecated libraries.
Supply chain failures are a superset that includes scenarios where the component is not "vulnerable" in any classical sense, because it was deliberately weaponized:
SCENARIO 1 — MAINTAINER ACCOUNT COMPROMISE
An attacker steals or socially engineers credentials to a package maintainer's account. They push a malicious version under the legitimate maintainer's identity. The Axios npm compromise of March 2026, attributed to North Korean threat actor UNC1069, used patient social engineering of the lead maintainer to gain account access. The Bitwarden CLI npm compromise used a similar pattern.
SCENARIO 2 — BUILD PIPELINE INJECTION
The malicious code is injected during the build process, not in the source. The Trivy GitHub Action compromise modified release tags after the build had completed, redirecting downstream consumers to attacker-controlled artifacts. Source review would not catch this. The artifact in the registry differed from the source in the repository.
SCENARIO 3 — TYPOSQUATTING AND DEPENDENCY CONFUSION
Attackers register packages with names similar to legitimate ones (requets vs requests, colorama-py vs colorama) or names matching internal company packages on public registries. PyPI removed hundreds of malicious typosquats per month throughout 2024 according to Checkmarx and Phylum tracking. The pattern continues in 2026.
SCENARIO 4 — TRANSITIVE DEPENDENCY POISONING
The infected package is not a direct dependency. The PyTorch Lightning compromise propagated through pyannote-audio, infecting consumers who never directly installed Lightning. The further the malicious component is from the consumer's direct dependency declaration, the harder it is to detect with manual review.
SCENARIO 5 — TAG MUTATION
GitHub Actions and similar systems allow tags to be reassigned to point at different commits. An attacker who compromises the publishing pipeline can force-push tags to point at malicious code. Every workflow that references the action by tag silently runs the malicious code on next execution.
SCENARIO 6 — MODEL WEIGHT TAMPERING
The AI model supply chain is the newest layer. HuggingFace incidents through 2024 and 2025 demonstrated that model weights can be manipulated to embed backdoors that activate on specific inputs. The OWASP LLM Top 10 covers this under its supply chain category, which overlaps with the new core OWASP category.
SCENARIO 7 — TOOL DESCRIPTION INJECTION
For LLM agent ecosystems, malicious instructions can be embedded in tool descriptions that the agent processes during MCP server registration. The MCPTox benchmark found that more than 60 percent of popular agents are susceptible to this class of attack. The compromise is in the metadata, not the code, which makes traditional code review insufficient.
// the defensive playbookThe defensive techniques against this category are mostly known. Adoption is the gap. The list below is what produces real reduction in supply chain risk, ordered by effort to value ratio:
Lockfile and integrity hashes everywhere. package-lock.json, yarn.lock, poetry.lock, Pipfile.lock, Gemfile.lock, go.sum, Cargo.lock. No exceptions. Every CI job that installs dependencies must use the lockfile. Most ecosystems support integrity hashes — use them.
Pin GitHub Actions to commit SHA, not version tag. Tags are mutable. SHAs are not. The diff between uses: aquasecurity/trivy-action@master and uses: aquasecurity/trivy-action@a3e4f... is the difference between a vulnerable workflow and a hardened one.
Sigstore verification on package install where the ecosystem supports it. npm audit signatures. PyPI attestations. Cosign for container images. The verification is fast and the cost is low. The benefit is non-trivial.
Behavioral analysis at install time. Socket, Phylum, Snyk Reachability, JFrog Curation, Checkmarx Supply Chain. These tools execute or sandbox new packages and flag suspicious behaviors — unexpected network calls, filesystem access, postinstall scripts that match known malware patterns. Catches attacks that signature-based tools miss.
Internal proxy with quarantine period. Net-new dependencies — packages your organization has never used before — go through a quarantine period during which they are scanned, behaviorally analyzed, and reviewed before being available to developers. Most malicious packages are caught in the first 24 to 72 hours after publication. A quarantine period eats most of the risk window.
SBOM generation in CI for every release. The starting point for vulnerability triage and supply chain analysis. Required by EU CRA. Useful regardless.
Namespace ownership for internal packages. Register your internal package names as stubs on the public registry. Prevents dependency confusion attacks where an attacker publishes a public package matching your internal name.
Egress control on build runners. The build runner has unrestricted internet access by default. Constraining its outbound network destinations to known package registries and known internal services eliminates an entire class of exfiltration paths.
Disable install-time script execution where feasible. npm install --ignore-scripts. pip install with PEP 517 isolation. Some legitimate packages break, requiring an allowlist. The remaining attack surface is much smaller.
Provenance attestations on your published packages. npm publish --provenance generates SLSA-style provenance metadata that downstream consumers can verify. Free signal that protects your users.
// the part that is not technicalThe honest takeaway from every supply chain incident I have read post-mortems on: the open source supply chain is held together by individual humans who notice things. Andres Freund noticing 500ms of unexplained latency and discovering the XZ backdoor. The crypto developer who noticed an anomalous Lottie transaction. The Sansec engineer who spotted the polyfill.io rewrite. The Lightning maintainers who discovered their PyPI compromise via user reports.
Tools narrow the attack surface. Tools do not eliminate it. The durable defense is a team that has time to investigate anomalies. The unfashionable, unscalable, irreplaceable component of supply chain security is human attention and engineering judgment.
The investment that produces the highest return: give your senior engineers explicit budget for "weird things in the build." The next XZ-class incident will be caught by someone paying attention. Make sure that someone exists in your organization, and that their attention is not consumed by dashboards.
// the bottom lineSoftware Supply Chain Failures earned its OWASP Top 10 spot the hard way. The category is not going to shrink. The attack surface keeps expanding — new package ecosystems, new model registries, new agent tool catalogs.
The defensive playbook is mostly known. The work is adoption. The teams that close their supply chain gaps in 2026 will read about other people's incidents in 2027. The teams that do not will be in the news.

$ end_of_post.sh — what's your organization's biggest supply chain gap? honest answers welcome.

10/04/2026

AI Vulnerability Research Goes Mainstream: The End of Attention Scarcity

The security industry just hit an inflection point, and most people haven't noticed yet.

For decades, vulnerability research was a craft. You needed deep expertise in memory layouts, compiler internals, protocol specifications, and the patience to trace inputs through code paths that no sane person would willingly read. The barrier to entry wasn't just skill — it was attention. Elite researchers could only focus on so many targets. Everything else got a free pass by obscurity.

That free pass just expired.

The Evidence Is In

In February 2026, Anthropic's Frontier Red Team published results from pointing Claude Opus 4.6 at well-tested open source codebases — projects with millions of hours of fuzzer CPU time behind them. The model found over 500 validated high-severity vulnerabilities. Some had been sitting undetected for decades.

No custom tooling. No specialised harnesses. No domain-specific prompting. Just a frontier model, a virtual machine with standard developer tools, and a prompt that amounted to: find me bugs.

Thomas Ptacek, writing in his now-viral essay "Vulnerability Research Is Cooked", summarised it bluntly:

You can't design a better problem for an LLM agent than exploitation research. Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code.

And Nicholas Carlini — the Anthropic researcher behind the findings — demonstrated that the process is almost embarrassingly simple. Loop over source files in a repository. Prompt the model to find exploitable vulnerabilities in each one. Feed the reports back through for verification. The success rate on that pipeline: almost 100%.

Why LLMs Are Uniquely Good at This

Traditional vulnerability discovery tools — fuzzers, static analysers, symbolic execution engines — are powerful but fundamentally limited. Fuzzers throw random inputs at code and wait for crashes. Coverage-guided fuzzers do it smarter, but they still can't reason about what they're looking at.

LLMs can. And the reasons are structural:

Capability	Traditional Tools	LLM Agents
Bug class knowledge	Encoded in rules/signatures	Internalised from training corpus
Cross-component reasoning	Limited to call graphs	Semantic understanding of interactions
Patch gap analysis	Not possible	Reads git history, finds incomplete fixes
Algorithm-level understanding	None	Can reason about LZW, YAML parsing, etc.
Fatigue	Infinite runtime, no reasoning	Infinite runtime with reasoning

The Anthropic results illustrate this perfectly. In one case, Claude found a vulnerability in GhostScript by reading the git commit history — spotting a security fix, then searching for other code paths where the same fix hadn't been applied. No fuzzer does that. In another, it exploited a subtle assumption in the CGIF library about LZW compression ratios, requiring conceptual understanding of the algorithm to craft a proof-of-concept. Coverage-guided fuzzing wouldn't catch it even with 100% branch coverage.

The Attention Scarcity Model Is Dead

Here's the part that should keep you up at night.

The entire security posture of the modern internet has been load-bearing on a single assumption: there aren't enough skilled researchers to look at everything. Chrome gets attention because it's a high-value target. Your hospital's PACS server doesn't, because nobody with elite skills cares enough to audit it.

As Ptacek puts it:

In a post-attention-scarcity world, successful exploit developers won't carefully pick where to aim. They'll just aim at everything. Operating systems. Databases. Routers. Printers. The inexplicably networked components of my dishwasher.

The cost of elite-level vulnerability research just dropped from "hire a team of specialists for six months" to "spin up 100 agent instances overnight." And unlike human researchers, agents don't need Vyvanse, don't get bored, and don't demand stock options.

What Wordfence Is Seeing

This isn't theoretical anymore. Wordfence reported in April 2026 that AI-assisted vulnerability research is now producing meaningful results in the WordPress ecosystem — one of the largest and most target-rich attack surfaces on the web. Researchers are using frontier models to audit plugins and themes at a pace that was previously impossible.

The WordPress ecosystem is a perfect canary for what's coming everywhere else. Thousands of plugins, maintained by small teams or solo developers, many with no dedicated security review process. The same pattern applies to npm packages, PyPI libraries, and every other open source ecosystem.

The Defender's Dilemma

The optimistic reading is that defenders can use these same capabilities. Anthropic is already contributing patches to open source projects. Bruce Schneier noted the trajectory in February. The ZeroDayBench paper is building standardised benchmarks for measuring agent capabilities in this space.

But here's the asymmetry that matters: defenders need to find and fix every bug. Attackers only need one.

And the operational challenges are stacking up:

Report volume: Open source maintainers were already drowning in AI-generated slop reports. Now they'll face a steady stream of valid high-severity findings. The 90-day disclosure window may not survive this.
Patch velocity: Finding bugs is now faster than fixing them. Many critical targets — routers, medical devices, industrial control systems — require physical access to patch.
Regulatory risk: Legislators who don't understand the nuance of dual-use security research may respond to the inevitable wave of AI-discovered exploits with incoherent regulation that disproportionately hamstrings defenders.
Closed source is no longer a defence: LLMs can reason from decompiled code and assembly as effectively as source. Security through obscurity was always weak — now it's nonexistent.

What This Means for Security Teams

If you're running a security programme in 2026, here's the reality check:

Assume your code will be audited by AI. Not "might be" — will be. Every open source dependency you use, every API endpoint you expose, every parser you've written. Act accordingly.
Integrate AI into your own security testing. If you're still relying solely on annual pentests and quarterly SAST scans, you're operating on 2023 assumptions in a 2026 threat landscape.
Invest in patch velocity. The bottleneck has shifted from finding bugs to fixing them. Your mean-time-to-remediate just became your most critical security metric.
Watch the regulation space. The political response to AI-discovered vulnerabilities will matter as much as the technical response. Get involved in the policy conversation before the suits write rules that make defensive research illegal.
Memory safety isn't optional anymore. The migration to Rust, Go, and other memory-safe languages was already important. With AI agents capable of finding every remaining memory corruption bug in your C/C++ codebase, it's now existential.

The Bottom Line

We're witnessing a phase transition in offensive security. The craft of vulnerability research — built over three decades of accumulated expertise, tribal knowledge, and hard-won intuition — is being commoditised in real time. The models aren't replacing the top 1% of researchers (yet). But they're replacing the other 99% of the work, and that 99% is where most real-world exploits come from.

The boring bugs. The overlooked code paths. The parsers nobody audited because they weren't glamorous enough. That's where the next wave of breaches will originate — and AI agents are already finding them faster than humans can patch them.

The question isn't whether AI will transform vulnerability research. It already has. The question is whether defenders can scale their response fast enough to keep up.

Based on what I'm seeing? It's going to be close.

Sources:

Elusive Thoughts

03/05/2026

Software Supply Chain Failures: The OWASP Category That Eats Everything

Software Supply Chain Failures: The OWASP Category That Eats Everything

// what makes supply chain different from "vulnerable components"

// the defensive playbook

// the part that is not technical

// the bottom line

10/04/2026

AI Vulnerability Research Goes Mainstream: The End of Attention Scarcity

The Evidence Is In

Why LLMs Are Uniquely Good at This

The Attention Scarcity Model Is Dead

What Wordfence Is Seeing

The Defender's Dilemma

What This Means for Security Teams

The Bottom Line

Prompt Injection, Deconstructed

New tool repo

My Other Blogs