Elusive Thoughts: The Da Vinci Cod(e) Review

Introduction

This article is going to talk about performing Web Application security code reviews the proper way (also known as my way). The best approach to perform a Web Application security code review would be to have at your disposal the Web Application (uploaded and running in a Web Server) and of course the Web Application code itself, because you would be able to verify your findings in real time (e.g. exploit a Cross Site Scripting issue immediately after you identify the issue in the code). Ideally this happens within a CI/CD pipeline where your SAST tool flags a finding, and you can spin up a local or staging instance to validate whether that finding is actually exploitable — that feedback loop is where the real security value lives.

But first lets define what is a security source code review. A security code review is a systematic examination of a Web Application source code that is intended to find and fix security mistakes overlooked in the initial development phase, improving both the overall security of the software. Reviews are done in various forms such as pair programming, informal walkthroughs, and formal inspections. It is often done by independent contractors or an internal security team, hiring a third independent party to perform the code review adds value because it gives to the company the chance to examine its code by a person that has been engaged in the last stage of the development process and has no "emotional attachments to the code" therefore has a unique perspective on the subject.

In a modern DevSecOps context, security code review is no longer a one-off event. It is a continuous activity embedded in the Software Development Lifecycle (SDLC). Pull Request (PR) reviews, automated SAST scans triggered on every commit, and periodic deep-dive manual reviews all coexist. The key shift in recent years is treating security code review as a living process rather than a checkpoint gate.

Types of code review

Code review practices fall into three main categories: 1) pair programming, 2) formal code review and 3) lightweight code review. Formal code review, such as a Fagan inspection, involves a careful and detailed process with multiple participants and multiple phases. Formal code reviews are the traditional method of review, in which software developers attend a series of meetings and review code line by line, usually using printed copies of the material. Formal inspections are extremely thorough and have been proven effective at finding defects in the code under review. Lightweight code review typically requires less overhead than formal code inspections, though it can be equally effective when done properly.

Lightweight reviews are often conducted as part of the normal development process:

Over-the-shoulder – One developer looks over the author's shoulder as the latter walks through the code.
Email pass-around – Source code management system emails code to reviewers automatically after checkin is made.
Pair Programming – Two authors develop code together at the same workstation, such is common in Extreme Programming.
Tool-assisted code review – Authors and reviewers use specialized tools designed for peer code review (e.g. GitHub Pull Requests, GitLab Merge Requests, Gerrit).

A fourth category has emerged in recent years: AI-assisted code review. Tools like GitHub Copilot, Semgrep Assistant, and Snyk Code now provide real-time security feedback directly in the IDE or during PR review. These AI-driven tools can detect patterns, suggest fixes, and even auto-remediate certain vulnerability classes. However, the same fundamental rule applies — they still require human validation.

Important note: Tools can be used to perform this task but they always need human verification. Tools do not understand context, which is the keystone of security code review. Tools are good at assessing large amounts of code and pointing out possible issues but a person needs to verify every single result to determine if it is a real issue, if it is actually exploitable, and calculate the risk to the enterprise. This is even more critical in the age of AI-generated code — studies suggest that AI-generated code contains security vulnerabilities in approximately 25-40% of cases, often including SQL injection, XSS, and insecure authentication patterns. Your SAST tool will flag them, but only a human reviewer can determine whether the context makes them exploitable.

What is the most important thing in a code review

The most important is applying the proper Threat Modeling. Threat modeling is an approach for analyzing the security of an application. It is a structured approach that enables you to identify, quantify, and address the security risks associated with an application. Threat modeling is not an approach to reviewing code but it does complement the security code review process. The inclusion of threat modeling in the SDLC can help to ensure that applications are being developed with security built-in from the very beginning. This, combined with the documentation produced as part of the threat modeling process, can give the reviewer a greater understanding of the system. This allows the reviewer to see where the entry points to the application are and the associated threats with each entry point.

In 2020, a group of threat modeling practitioners, researchers and authors published the Threat Modeling Manifesto — a document that distills the collective knowledge of the community into values and principles, similar to the Agile Manifesto. The Manifesto anchors all threat modeling around four fundamental questions:

What are we building? — Understand the system through diagrams (DFDs, architecture diagrams).
What can go wrong? — Identify threats using frameworks like STRIDE, PASTA, LINDDUN, or Attack Trees.
What are we going to do about it? — Define countermeasures and mitigations.
Did we do a good enough job? — Validate and iterate on the threat model.

These four questions should be your north star regardless of which specific methodology you choose. The concept of threat modeling is not new but there has been a clear mindset change in recent years. Modern threat modeling looks at a system from a potential attacker's perspective, as opposed to a defender's viewpoint. The industry has also recognized that threat modeling is not just a security team exercise — it requires cross-functional collaboration involving developers, architects, business analysts, DevOps engineers, and security professionals.

Threat Modeling Frameworks: A Modern Overview

There are several established frameworks for threat modeling. Choosing the right one depends on your organization's maturity, the type of system being analyzed, and whether your primary concern is security, privacy, or business risk. Below is an overview of the most relevant frameworks in 2025.

STRIDE

STRIDE is the most widely adopted threat modeling framework, originally developed by Microsoft. It categorizes threats into six types:

Spoofing — Can an attacker impersonate another user or system?
Tampering — Can an attacker modify data in transit or at rest?
Repudiation — Can an attacker deny having performed an action?
Information Disclosure — Can sensitive data leak to unauthorized parties?
Denial of Service — Can an attacker degrade or disrupt service availability?
Elevation of Privilege — Can an attacker gain unauthorized access to higher-level functions?

STRIDE is a strong fit for teams that are new to threat modeling. It is easy to teach, quick to adopt, and integrates well into agile development practices. You apply it by creating a Data Flow Diagram (DFD) of your system, then systematically asking "can this element be affected by Spoofing? Tampering?" and so on for each element. Microsoft made STRIDE a core component of their Security Development Lifecycle (SDL), which they credit as one of the reasons for the increased security of their products.

Limitation: STRIDE focuses exclusively on security threats. It does not address privacy concerns, business risk alignment, or attacker motivation. It is also a static framework that works best at design time — it does not inherently adapt to runtime threat intelligence.

PASTA (Process for Attack Simulation and Threat Analysis)

PASTA is a seven-stage, risk-centric threat modeling methodology created by Tony UcedaVélez and Marco M. Morana. Unlike STRIDE, which focuses on categorizing threat types, PASTA takes a holistic view by considering both business impact and technical risk. It connects technical threats directly to business objectives, making it particularly valuable in enterprise environments where security decisions must be justified by business value.

The seven stages of PASTA are:

Define Objectives — Identify business objectives, security requirements, compliance requirements, and data classification for the application in scope.
Define Technical Scope — Map all system components, their relationships, interdependencies, and the attack surface.
Application Decomposition — Break down the system into data flows, processes, trust boundaries, user roles, and permissions.
Threat Analysis — Identify threat actors, their motivations, and create Attack Trees to model how they could achieve their goals.
Vulnerability Analysis — Correlate threats with known vulnerabilities using data from vulnerability scanners, penetration test reports, and threat intelligence feeds.
Attack Modeling — Simulate attack scenarios to test the viability of identified threats against existing countermeasures.
Risk and Impact Analysis — Calculate residual risk, prioritize findings by business impact, and define remediation strategies.

PASTA is ideal for mature organizations that want to link their security activities to broader business risk. It provides the depth and structure needed for high-assurance systems, especially in finance, healthcare, and critical infrastructure. The key advantage of PASTA over STRIDE is its collaborative, cross-functional nature — it brings together developers, architects, business analysts, risk professionals, and SOC team members in a way that STRIDE's developer-centric approach does not.

Limitation: PASTA is more complex to execute and requires a higher level of expertise. The accuracy of the methodology depends heavily on the availability and quality of data regarding the system and its architecture. It is not a "quick start" framework — expect a significant investment of time and stakeholder coordination.

Attack Trees

Attack Trees are a complementary technique that can be used alongside STRIDE, PASTA, or any other framework. Originally formalized by Bruce Schneier in 1999 (building on earlier work by Edward Amoroso and the NSA), Attack Trees provide a visual, hierarchical representation of how an attacker might achieve a specific goal.

The structure is simple:

The root node represents the attacker's goal (e.g., "Steal user credentials").
Child nodes represent the different ways to achieve that goal.
Nodes are connected using AND/OR logic: OR nodes represent alternatives (any one path suffices), AND nodes represent steps that must all be completed.
Each node can carry additional metadata: likelihood, cost, required skill level, detectability.

For example, an Attack Tree for "Bypass Authentication" might look like:

Bypass Authentication [ROOT - OR]
├── Brute Force Password [OR]
│   ├── Online brute force (if no rate limiting)
│   └── Offline brute force (if password hashes leaked)
├── Credential Stuffing [OR]
│   └── Use credentials from previous breaches
├── Session Hijacking [OR]
│   ├── Steal session cookie via XSS
│   └── Session fixation attack
├── Exploit Password Reset [OR]
│   ├── Predictable reset tokens
│   └── Account takeover via email compromise
└── SQL Injection on Login [OR]
    └── Bypass authentication via tautology (e.g. ' OR '1'='1)

Attack Trees are powerful because they go beyond graphical representation — they provide tactical insights that enable targeted defenses. They also serve as excellent communication tools for presenting security risks to leadership or non-technical stakeholders. Within PASTA specifically, Attack Trees are created during the fourth stage (Threat Analysis) to model how identified threat actors might achieve their goals.

Tip: Tools like OWASP Threat Dragon, Microsoft Threat Modeling Tool, IriusRisk, and Devici can help you build and maintain Attack Trees as living documents that evolve with your application.

LINDDUN (Privacy Threat Modeling)

LINDDUN is a privacy-focused threat modeling framework developed by researchers at KU Leuven. While STRIDE addresses security (confidentiality, integrity, availability), LINDDUN addresses privacy-specific concerns that traditional security frameworks overlook. With regulations like GDPR, CCPA, and HIPAA becoming stricter, privacy threat modeling is no longer optional for applications handling personal data.

LINDDUN stands for:

Linking — Can an adversary combine data to learn more about an individual?
Identifying — Can the identity of a data subject be determined?
Non-repudiation — Can a user be unable to deny an action (sometimes a privacy threat, not just a security feature)?
Detecting — Can an adversary detect that a user is using a system?
Data Disclosure — Can personal data leak to unauthorized parties?
Unawareness — Are users insufficiently informed about data collection and processing?
Non-compliance — Does the system fail to comply with privacy regulations and best practices?

LINDDUN follows the same four fundamental questions as the Threat Modeling Manifesto and can operate alongside STRIDE on a single DFD, enabling organizations to perform comprehensive privacy and security analysis without duplicating work. It comes in multiple flavors: LINDDUN GO (a gamified card-based approach for lean brainstorming sessions), LINDDUN PRO (a systematic, exhaustive approach starting from DFD analysis), and LINDDUN MAESTRO (an advanced approach with enriched system descriptions).

When to use it: If your application processes personal data — user profiles, health records, financial information, location data — you should be running LINDDUN alongside STRIDE. The cost of privacy violations (both regulatory fines and loss of user trust) now frequently exceeds the cost of traditional security breaches.

Choosing Your Framework

You do not have to pick only one framework. Many security teams begin with STRIDE to cover general threats, then layer in LINDDUN for privacy analysis. PASTA can be introduced later as the organization matures and seeks deeper insights into how threats connect to business objectives. Attack Trees can be used within any of these frameworks to drill deeper into specific attack scenarios. The most effective programs evolve from lightweight models into more integrated, cross-functional strategies.

The Threat Modeling Process

Regardless of which framework you choose, the threat modeling process can be decomposed into 3 high level steps:

Step 1: Decompose the Application:

Create use-cases to understand how the application is used.
Identify entry points (APIs, web forms, file uploads, message queues, webhooks).
Identify assets (databases, secrets, PII, session tokens, cryptographic keys).
Identify trust levels and trust boundaries between external entities.
Map data flows using DFDs or sequence diagrams.

Note: This stage has to do with understanding the context of the Web Application and its surrounding entities. In modern architectures, this includes microservices communication, API gateways, third-party integrations, cloud provider boundaries, and container orchestration layers.

The following images show the Business Architecture (Business Owner's Perspective) and Business Architecture Behavior of a Web Application:

Note: Lists the entities important to the business. Business entities can be a person, a thing or a concept that is part of or interacts with the business process (Proforma 2003). In the example of "XYZ-Match", the business entities include the following: Investors, Entrepreneurs, "XYZ-Match" web system.

Note: Lists the processes in which the business operates. In the example of "XYZ-Match", "Investor listing information to Venture Capital Directory" is one of such business processes.

Step 2: Determine and rank threats using your chosen categorization methodology:

Authentication and Identity Management
Authorization and Access Control
Session Management
Input Validation and Output Encoding
Data Protection in Storage and Transit (encryption at rest, TLS, key management)
Auditing, Logging, and Monitoring
Configuration Management and Secrets Management
Error Handling and Exception Management
Supply Chain and Dependency Security

Note: This stage has to do with mapping the vulnerabilities to a category. Threat listing is an important part of a Web Application code audit. Threat lists based on the STRIDE model are useful for identifying threats against attacker goals. Categorizing and grouping the Web Application threats will help to see which security controls have the majority of the problems — it is like a blinking led that says "Hey I have multiple problems, save me please I am a poor cod dying out, save me".

Step 3: Determine countermeasures and mitigation.

Note: Such countermeasures can be identified using threat-countermeasure mapping lists. The risk mitigation strategy might involve evaluating these threats from the business impact that they pose and reducing the risk.

The objective of risk management should be to reduce the impact that the exploitation of a threat can have to the application (not to necessarily mitigate the risk!). This can be done by responding to a threat with a risk mitigation strategy. In general there are five options to mitigate threats:

Do nothing: for example, hoping for the best.
Informing about the risk: for example, warning user population about the risk.
Mitigate the risk: for example, by putting countermeasures in place.
Accept the risk: for example, after evaluating the impact of the exploitation (business impact).
Transfer the risk: for example, through contractual agreements and insurance.

The decision of which strategy is most appropriate depends on the impact an exploitation of a threat can have, the likelihood of its occurrence, and the costs for transferring or avoiding it.

Define the application requirements:

Identify business objectives
Identify user roles that will interact with the application
Identify the data the application will manipulate
Identify the use cases for operating on that data that the application will facilitate

Model the application architecture:

Model the components of the application
Model the service roles that the components will act under
Model any external dependencies (third-party APIs, open-source libraries, cloud services)
Model the calls from roles, to components and eventually to the data store for each use case

Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing
Assign risk values and determine the risk responses
Determine the countermeasures to implement based on your chosen risk responses
Continually update the threat model based on the emerging security landscape — threat modeling is not a one-time activity, it must evolve as the application, its dependencies, and the threat landscape change.

Modern Tools for Security Code Auditing

The tooling landscape for security code review has evolved dramatically. In the early days, tools like Graudit (a grep-based signature scanner), RATS, and findstr were the go-to options. While these still have educational value for understanding how pattern matching works, modern SAST (Static Application Security Testing) tools have moved far beyond simple regex-based detection.

Here is the current state of the art:

Semgrep

Semgrep is a lightweight, open-source static analysis tool that uses semantic pattern matching rather than simple text matching. This means it understands code structure — not just text patterns — enabling fast processing while maintaining accuracy. Developers can write custom rules in YAML that look like the code they want to find, making rule creation intuitive.

semgrep scan --config=auto /path/to/code

Key strengths: fast scanning (no compilation required), supports 20+ languages, integrates into CI/CD pipelines, IDE plugins, and PR checks. The commercial Semgrep AppSec Platform adds cross-file/cross-function dataflow analysis, SCA (Software Composition Analysis), secrets detection, and an AI-powered assistant for triage and autofix.

Note: In January 2025, after Semgrep changed its open-source licensing, 10+ competing vendors forked the community edition into "Opengrep." If you are evaluating Semgrep, be aware of this licensing shift and consider whether Opengrep-based alternatives (like Aikido Security) better fit your needs.

CodeQL (GitHub Advanced Security)

CodeQL is a semantic code analysis engine that compiles source code into a queryable relational database representing the AST, data flow graph, and control flow graph. Users write queries in QL (a Datalog-derived declarative language) to traverse this database. It is extremely powerful for deep, customized vulnerability hunting — CodeQL variant analysis has been used to identify over 400 CVEs in open-source projects.

codeql database create mydb --language=java --source-root=/path/to/code
codeql database analyze mydb codeql/java-queries:codeql-suites/java-security-and-quality.qls --format=sarif-latest --output=results.sarif

Key strengths: deep semantic analysis, powerful custom query language, native GitHub integration with Copilot Autofix for AI-powered fix suggestions. Free for open-source projects.

Limitation: Steep learning curve — requires specialized knowledge of QL. Requires code compilation to build the database, making it slower than Semgrep for quick scans. Best suited for security research and deep auditing rather than fast-moving DevSecOps workflows.

Other Notable Tools

Snyk Code — Developer-focused SAST with real-time IDE scanning and AI-trained detection engine. Strong on AI-generated code pattern detection.
SonarQube — Combines SAST with code quality checks. Good for teams that want security and maintainability in one platform.
Checkmarx / Veracode / OpenText (Fortify) — Enterprise-grade SAST platforms with deep scanning, compliance reporting, and legacy language support.
Bandit — Open-source Python-specific SAST tool. Lightweight and great for Python-heavy shops.
OWASP Dependency-Check / Trivy — SCA tools for scanning third-party dependencies for known vulnerabilities.
GitGuardian / Gitleaks — Secrets detection tools that scan repos for exposed credentials, API keys, and tokens.

Important: No single tool covers everything. The modern AppSec stack typically combines SAST + SCA + Secrets Detection + DAST + IaC scanning. The goal is layered defense — each tool catches what the others miss. And remember the golden rule: a tool is only as good as the human reviewing its output.

Cod(e) reviewing for SQL Injection

SQL Injection remains one of the most critical and prevalent vulnerabilities, consistently appearing in the OWASP Top 10. Despite decades of awareness, it persists because developers still make the same fundamental mistake: constructing SQL queries through string concatenation with untrusted input.

Use parameterized queries (PreparedStatements in Java) instead of dynamic SQL statements. Data validate all external input: ensure that all SQL statements recognize user inputs as variables, and that statements are precompiled before the actual inputs are substituted for the variables. A simplified way of thinking about SQL injection when talking about security code reviews would be to emphasize multiple layers of defense through the whole Web Application system. Input validation should occur at the Web Application input filter, the framework/ORM layer, and the database layer itself. Additional layers of defense can be added through a Web Application Firewall (WAF) and a Database Activity Monitor.

The following picture shows a sequence of yes and no flow chart explaining an SQL injection flow:

Note: This is a simplified SQL Injection threat model. In practice, the decision tree branches further when you consider second-order injection, blind SQLi, and out-of-band channels.

The Vulnerable Code (Java — what NOT to do)

// VULNERABLE: SQL Injection via string concatenation
// This is the classic mistake — user input directly embedded in SQL

String username = request.getParameter("USER");       // From HTTP request — UNTRUSTED
String password = request.getParameter("PASSWORD");   // From HTTP request — UNTRUSTED

// DANGER: Direct concatenation of user input into SQL query
String sql = "SELECT User_id, Username FROM USERS WHERE Username = '"
    + username + "' AND Password = '" + password + "'";

Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);

// An attacker entering: Username = ' OR '1'='1' --
// Produces: SELECT User_id, Username FROM USERS
//           WHERE Username = '' OR '1'='1' --' AND Password = ''
// Result: Authentication bypass — returns all users

When SQL statements are dynamically created as software executes, there is an opportunity for a security breach as the input data can truncate, malform, or expand the original SQL query. The request.getParameter() retrieves the data for the SQL query directly from the HTTP request without any data validation (min/max length, permitted characters, malicious characters). This error gives rise to the ability to input SQL as the payload and alter the functionality of the statement.

The Secure Code (Java — PreparedStatement / Parameterized Query)

// SECURE: Parameterized query using PreparedStatement
// The SQL structure is precompiled; user input is ALWAYS treated as data

String username = request.getParameter("USER");
String password = request.getParameter("PASSWORD");

// The '?' placeholders ensure input can never alter the query structure
String sql = "SELECT User_id, Username FROM USERS WHERE Username = ? AND Password = ?";

try (PreparedStatement pstmt = connection.prepareStatement(sql)) {

    pstmt.setString(1, username);  // Bound as data, not SQL code
    pstmt.setString(2, password);  // Bound as data, not SQL code

    try (ResultSet rs = pstmt.executeQuery()) {
        if (rs.next()) {
            int userId = rs.getInt("User_id");
            String loggedUser = rs.getString("Username");
            // Authentication successful
        } else {
            // Authentication failed
        }
    }
} catch (SQLException e) {
    logger.error("Database error during authentication", e);
    // NEVER expose stack traces or SQL errors to the user
}

The PreparedStatement precompiles the SQL query with placeholder markers (?). When setString() is called, the JDBC driver ensures the input is treated strictly as a string literal — it can never be interpreted as SQL code. Even if an attacker enters ' OR '1'='1, the database will literally search for a user with that exact string as their username, which will return nothing.

Modern Alternative: Using an ORM (JPA/Hibernate)

In modern Java applications, you often interact with the database through an ORM rather than raw JDBC. Here is how the same query looks using JPA (Java Persistence API):

// SECURE: JPA Named Query — parameterized by default

@Entity
@NamedQuery(
    name = "User.findByCredentials",
    query = "SELECT u FROM User u WHERE u.username = :username AND u.password = :password"
)
public class User { ... }

// Usage:
TypedQuery<User> query = entityManager.createNamedQuery("User.findByCredentials", User.class);
query.setParameter("username", request.getParameter("USER"));
query.setParameter("password", request.getParameter("PASSWORD"));
List<User> results = query.getResultList();

Warning: ORMs are not automatically safe. If you use string concatenation to build JPQL or HQL queries, you are just as vulnerable. The rule is the same: always use parameterized queries regardless of the abstraction layer.

Python Example (for comparison)

# VULNERABLE — string formatting with untrusted input
cursor.execute(f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}'")

# SECURE — parameterized query
cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username, password))

What to grep for during code review (SQL Injection indicators)

When performing a manual code review or writing custom SAST rules, look for these patterns:

Statement.execute( or Statement.executeQuery( combined with string concatenation (+)
"SELECT ... " + variable or "INSERT ... " + variable or "UPDATE ... " + variable
String.format() used to build SQL queries
f"SELECT ..." (Python f-strings in SQL context)
cursor.execute("... %s ..." % variable) (Python old-style string formatting — NOT the same as parameterized %s)
$"SELECT ..." (C# string interpolation in SQL context)

A Semgrep rule to detect Java SQL injection looks like this:

# semgrep-rule: java-sql-injection.yaml
rules:
  - id: java-sqli-string-concat
    patterns:
      - pattern: |
          String $QUERY = "..." + $INPUT + "...";
          ...
          $STMT.executeQuery($QUERY);
    message: >
      Potential SQL injection: user input concatenated into SQL query.
      Use PreparedStatement with parameterized queries instead.
    severity: ERROR
    languages: [java]

Epilogue

Educating developers to write secure code is the paramount goal of a secure code review. Taking code review from this standpoint is the only way to promote and improve code quality. Part of the education process is to empower developers with the knowledge in order to write better code. This can be done by providing developers with a controlled set of rules which the developer can compare their code to. Modern SAST tools like Semgrep and CodeQL embody this philosophy — they provide immediate, contextual feedback in the developer's IDE and PR workflow, turning every code review into a learning opportunity.

The landscape has changed dramatically since the days of grep-based scanning. We now have AI-assisted remediation, cross-function taint analysis, reachability-based SCA, and privacy-specific threat modeling frameworks. But the fundamental truth remains: tools augment human judgment, they do not replace it. The best security code review is one where a knowledgeable reviewer uses the right tools to focus their attention on what matters, validates findings in context, and communicates risk in business terms that drive action.

Threat model your applications. Use STRIDE for security, LINDDUN for privacy, PASTA when you need to tie threats to business impact, and Attack Trees to drill into specific attack scenarios. Embed SAST in your pipeline. Review every finding with human eyes. And never, ever concatenate user input into a SQL query.

References:

Elusive Thoughts

06/11/2012

The Da Vinci Cod(e) Review

A Threat Model Developers Will Actually Use

New tool repo

My Other Blogs