Grepping the Robot: AppSec Review for AI-Generated Code
APPSECCODE REVIEWAI CODE
Half the code shipping to production in 2026 has an LLM's fingerprints on it. Cursor, Copilot, Claude Code, and the quietly terrifying "I asked ChatGPT and pasted it in" workflow. The code compiles. The tests pass. The security review is an afterthought.
AI-generated code fails in characteristic, greppable ways. Once you know the patterns, review gets fast. Here's the working list I use when auditing AI-heavy codebases.
Failure Class 1: Hallucinated Imports (Slopsquatting)
LLMs invent package names. They sound right, they're spelled right, and they don't exist — or worse, they exist because an attacker registered the hallucinated name and put a payload in it. This is "slopsquatting," and it's the supply chain attack tailor-made for the AI era.
What to grep for:
# Python
grep -rE "^(from|import) [a-z_]+" . | sort -u
# Cross-reference against your lockfile.
# Any import that isn't pinned is a candidate.
# Node
jq '.dependencies + .devDependencies' package.json \
| grep -E "[a-z-]+" \
| # check each against npm registry creation date;
# anything <30 days old warrants a look
Red flags: packages with no GitHub repo, no download history, recently published, or names that are almost-but-not-quite popular libraries (python-requests instead of requests, axios-http instead of axios).
Failure Class 2: Outdated API Patterns
Training data lags reality. LLMs cheerfully suggest deprecated crypto, old auth flows, and APIs that were marked "do not use" two years before the model was trained.
Common offenders:
md5/sha1for anything remotely security-related.pickle.loadson anything that isn't purely local.- Old
jwtlibraries with known algorithm-confusion bugs. - Deprecated
crypto.createCipherin Node (notcreateCipheriv). - Python 2-era
urllibpatterns without TLS verification. - Old OAuth 2.0 implicit flow (no PKCE).
Grep starter:
grep -rnE "hashlib\.(md5|sha1)\(" .
grep -rnE "pickle\.loads" .
grep -rnE "createCipher\(" .
grep -rnE "verify\s*=\s*False" .
grep -rnE "rejectUnauthorized\s*:\s*false" .
Failure Class 3: Placeholder Secrets That Shipped
AI code generators love producing "working" examples with placeholder values that look like real config. Developers paste them in, forget to replace them, and commit.
Classic artifacts:
SECRET_KEY = "your-secret-key-here"API_TOKEN = "sk-placeholder"DEBUG = Truein production configs.- Example JWT secrets like
"change-me","supersecret","dev". - Hardcoded
localhostDB credentials that got promoted when the file was copied.
Grep:
grep -rnE "(secret|key|token|password)\s*=\s*[\"'](change|your|placeholder|dev|test|example|supersecret)" .
grep -rnE "DEBUG\s*=\s*True" .
And obviously, run something like gitleaks or trufflehog on the history. AI-generated code increases the base rate of this mistake significantly.
Failure Class 4: SQL Injection via F-Strings
Every LLM knows you shouldn't concatenate SQL. Every LLM does it anyway when you ask for "a quick script." The modern flavor is Python f-strings:
cur.execute(f"SELECT * FROM users WHERE id = {user_id}")
Or its cousins:
cur.execute("SELECT * FROM users WHERE name = '" + name + "'")
db.query(`SELECT * FROM logs WHERE user='${req.query.user}'`)
Grep is your friend:
grep -rnE "execute\(f[\"']" .
grep -rnE "execute\([\"'].*\+.*[\"']" .
grep -rnE "query\(\`.*\\\$\{" .
AI tools default to "getting the query to run" and rarely volunteer parameterization unless asked. If you see raw string construction anywhere near a DB driver, stop and re-review.
Failure Class 5: Missing Input Validation
The model ships "working" endpoints. "Working" means it returns 200. It does not mean it rejects malformed, oversized, or malicious input.
What I check:
- Every Flask/FastAPI/Express handler: is there a schema validator (
pydantic,zod,joi)? Or is it justrequest.json["whatever"]? - Every file upload: size limit? Mime check? Extension whitelist? Or is it
save(request.files["file"])? - Every redirect: is the target validated against an allowlist, or echoed from the query string?
- Every template render: is user input going into a template with autoescape off?
LLMs skip validation because it's boring and it wasn't in the prompt. You have to ask for it explicitly, which means most codebases don't have it.
Failure Class 6: Overly Permissive Defaults
Ask an AI for a CORS config and you'll get allow_origins=["*"]. Ask for an S3 bucket and you'll get a public policy "so we can test it." Ask for a Dockerfile and you'll get USER root.
AI generators optimize for "this works on the first try." Security defaults break things on the first try. So the generator trades your security posture for a green checkmark.
Grep + manual review targets:
grep -rnE "allow_origins.*\*" .
grep -rnE "Access-Control-Allow-Origin.*\*" .
grep -rnE "^USER root" Dockerfile*
grep -rnE "chmod\s+777" .
grep -rnE "IAM.*\*:\*" .
Failure Class 7: SSRF in Helper Functions
"Fetch a URL and return its contents" is a common AI-generated utility. It almost never has SSRF protection. It takes a URL, passes it to requests.get, and returns the body. Point it at http://169.254.169.254/ and you've just exfiltrated cloud credentials.
Patterns to flag:
grep -rnE "requests\.(get|post)\(.*user" .
grep -rnE "urlopen\(.*req" .
grep -rnE "fetch\(.*req\.(query|body|params)" .
Any helper that takes a URL from user input and fetches it needs: scheme allowlist, host allowlist or deny-list, resolve-and-check for internal IPs, and ideally a separate egress proxy. AI-generated versions have none of these.
Failure Class 8: Auth That "Checks"
This is the subtle one. The model produces auth middleware that reads a token, decodes it, and does nothing. Or it uses jwt.decode without verify=True. Or it trusts the alg field from the token header.
Concrete tells:
jwt.decode(token, options={"verify_signature": False})- Comparing tokens with
==instead ofhmac.compare_digest. - Role checks that string-match on client-supplied values without re-fetching from the DB.
- Session middleware that doesn't check expiration.
These slip past review because the code looks like auth. It has tokens and decodes and middleware. It just doesn't actually authenticate.
The AI Code Review Cheat Sheet
| Failure class | Fast grep |
|---|---|
| Hallucinated imports | Cross-reference against lockfile & registry age |
| Weak crypto | md5|sha1|createCipher|pickle.loads |
| Placeholder secrets | secret.*=.*\"your|change|supersecret |
| SQL injection | execute\(f|execute\(.*\+|query\(\`.*\$\{ |
| Missing validation | Handlers without schema libs in imports |
| Permissive defaults | allow_origins.*\*|USER root|777 |
| SSRF | requests\.get\(.*user|urlopen\(.*req |
| Broken auth | verify_signature.*False|==.*token |
The Workflow
- Run the greps above on every PR tagged as "AI-assisted" or from a repo you know uses Cursor/Copilot heavily. Most issues surface immediately.
- Verify every third-party package against the registry. Pin versions. Require approval for new dependencies.
- Read the handler code with a paranoid eye. Assume no validation, no auth, no limits. Confirm each of those exists before approving.
- Run Semgrep with AI-code-focused rulesets — there are several public ones now. They won't catch everything but they catch a lot.
- Don't let the tests lull you. AI-generated tests cover the happy path. They don't cover malformed input, auth bypass, or edge cases. Adversarial tests must be human-written.
The Meta-Lesson
AI doesn't write insecure code because it's malicious. It writes insecure code because it optimizes for "functional" over "defensive," and because its training data is full of tutorials that prioritize clarity over hardening. The result is a predictable, well-documented, highly greppable set of failure modes.
Learn the patterns. Build the muscle memory. In a world where half your codebase was written by a language model, your grep is your scalpel.
Trust the code to do what it says. Verify it doesn't do what it shouldn't.