06/11/2012

The Da Vinci Cod(e) Review

Introduction

This article is going to talk about performing Web Application security code reviews the proper way (also known as my way). The best approach to perform a Web Application security code review would be to have at your disposal the Web Application (uploaded and running in a Web Server) and of course the Web Application code itself, because you would be able to verify your findings in real time (e.g. exploit a Cross Site Scripting issue immediately after you identify the issue in the code). Ideally this happens within a CI/CD pipeline where your SAST tool flags a finding, and you can spin up a local or staging instance to validate whether that finding is actually exploitable — that feedback loop is where the real security value lives.


But first lets define what is a security source code review. A security code review is a systematic examination of a Web Application source code that is intended to find and fix security mistakes overlooked in the initial development phase, improving both the overall security of the software. Reviews are done in various forms such as pair programming, informal walkthroughs, and formal inspections. It is often done by independent contractors or an internal security team, hiring a third independent party to perform the code review adds value because it gives to the company the chance to examine its code by a person that has been engaged in the last stage of the development process and has no "emotional attachments to the code" therefore has a unique perspective on the subject.

In a modern DevSecOps context, security code review is no longer a one-off event. It is a continuous activity embedded in the Software Development Lifecycle (SDLC). Pull Request (PR) reviews, automated SAST scans triggered on every commit, and periodic deep-dive manual reviews all coexist. The key shift in recent years is treating security code review as a living process rather than a checkpoint gate.

Types of code review

Code review practices fall into three main categories: 1) pair programming, 2) formal code review and 3) lightweight code review. Formal code review, such as a Fagan inspection, involves a careful and detailed process with multiple participants and multiple phases. Formal code reviews are the traditional method of review, in which software developers attend a series of meetings and review code line by line, usually using printed copies of the material. Formal inspections are extremely thorough and have been proven effective at finding defects in the code under review. Lightweight code review typically requires less overhead than formal code inspections, though it can be equally effective when done properly.

Lightweight reviews are often conducted as part of the normal development process:
  1. Over-the-shoulder – One developer looks over the author's shoulder as the latter walks through the code.
  2. Email pass-around – Source code management system emails code to reviewers automatically after checkin is made.
  3. Pair Programming – Two authors develop code together at the same workstation, such is common in Extreme Programming.
  4. Tool-assisted code review – Authors and reviewers use specialized tools designed for peer code review (e.g. GitHub Pull Requests, GitLab Merge Requests, Gerrit).
A fourth category has emerged in recent years: AI-assisted code review. Tools like GitHub Copilot, Semgrep Assistant, and Snyk Code now provide real-time security feedback directly in the IDE or during PR review. These AI-driven tools can detect patterns, suggest fixes, and even auto-remediate certain vulnerability classes. However, the same fundamental rule applies — they still require human validation.

Important note: Tools can be used to perform this task but they always need human verification. Tools do not understand context, which is the keystone of security code review. Tools are good at assessing large amounts of code and pointing out possible issues but a person needs to verify every single result to determine if it is a real issue, if it is actually exploitable, and calculate the risk to the enterprise. This is even more critical in the age of AI-generated code — studies suggest that AI-generated code contains security vulnerabilities in approximately 25-40% of cases, often including SQL injection, XSS, and insecure authentication patterns. Your SAST tool will flag them, but only a human reviewer can determine whether the context makes them exploitable.

What is the most important thing in a code review

The most important is applying the proper Threat Modeling. Threat modeling is an approach for analyzing the security of an application. It is a structured approach that enables you to identify, quantify, and address the security risks associated with an application. Threat modeling is not an approach to reviewing code but it does complement the security code review process. The inclusion of threat modeling in the SDLC can help to ensure that applications are being developed with security built-in from the very beginning. This, combined with the documentation produced as part of the threat modeling process, can give the reviewer a greater understanding of the system. This allows the reviewer to see where the entry points to the application are and the associated threats with each entry point.

In 2020, a group of threat modeling practitioners, researchers and authors published the Threat Modeling Manifesto — a document that distills the collective knowledge of the community into values and principles, similar to the Agile Manifesto. The Manifesto anchors all threat modeling around four fundamental questions:
  1. What are we building? — Understand the system through diagrams (DFDs, architecture diagrams).
  2. What can go wrong? — Identify threats using frameworks like STRIDE, PASTA, LINDDUN, or Attack Trees.
  3. What are we going to do about it? — Define countermeasures and mitigations.
  4. Did we do a good enough job? — Validate and iterate on the threat model.
These four questions should be your north star regardless of which specific methodology you choose. The concept of threat modeling is not new but there has been a clear mindset change in recent years. Modern threat modeling looks at a system from a potential attacker's perspective, as opposed to a defender's viewpoint. The industry has also recognized that threat modeling is not just a security team exercise — it requires cross-functional collaboration involving developers, architects, business analysts, DevOps engineers, and security professionals.

Threat Modeling Frameworks: A Modern Overview

There are several established frameworks for threat modeling. Choosing the right one depends on your organization's maturity, the type of system being analyzed, and whether your primary concern is security, privacy, or business risk. Below is an overview of the most relevant frameworks in 2025.

STRIDE

STRIDE is the most widely adopted threat modeling framework, originally developed by Microsoft. It categorizes threats into six types:
  • Spoofing — Can an attacker impersonate another user or system?
  • Tampering — Can an attacker modify data in transit or at rest?
  • Repudiation — Can an attacker deny having performed an action?
  • Information Disclosure — Can sensitive data leak to unauthorized parties?
  • Denial of Service — Can an attacker degrade or disrupt service availability?
  • Elevation of Privilege — Can an attacker gain unauthorized access to higher-level functions?
STRIDE is a strong fit for teams that are new to threat modeling. It is easy to teach, quick to adopt, and integrates well into agile development practices. You apply it by creating a Data Flow Diagram (DFD) of your system, then systematically asking "can this element be affected by Spoofing? Tampering?" and so on for each element. Microsoft made STRIDE a core component of their Security Development Lifecycle (SDL), which they credit as one of the reasons for the increased security of their products.

Limitation: STRIDE focuses exclusively on security threats. It does not address privacy concerns, business risk alignment, or attacker motivation. It is also a static framework that works best at design time — it does not inherently adapt to runtime threat intelligence.

PASTA (Process for Attack Simulation and Threat Analysis)

PASTA is a seven-stage, risk-centric threat modeling methodology created by Tony UcedaVélez and Marco M. Morana. Unlike STRIDE, which focuses on categorizing threat types, PASTA takes a holistic view by considering both business impact and technical risk. It connects technical threats directly to business objectives, making it particularly valuable in enterprise environments where security decisions must be justified by business value.

The seven stages of PASTA are:
  1. Define Objectives — Identify business objectives, security requirements, compliance requirements, and data classification for the application in scope.
  2. Define Technical Scope — Map all system components, their relationships, interdependencies, and the attack surface.
  3. Application Decomposition — Break down the system into data flows, processes, trust boundaries, user roles, and permissions.
  4. Threat Analysis — Identify threat actors, their motivations, and create Attack Trees to model how they could achieve their goals.
  5. Vulnerability Analysis — Correlate threats with known vulnerabilities using data from vulnerability scanners, penetration test reports, and threat intelligence feeds.
  6. Attack Modeling — Simulate attack scenarios to test the viability of identified threats against existing countermeasures.
  7. Risk and Impact Analysis — Calculate residual risk, prioritize findings by business impact, and define remediation strategies.
PASTA is ideal for mature organizations that want to link their security activities to broader business risk. It provides the depth and structure needed for high-assurance systems, especially in finance, healthcare, and critical infrastructure. The key advantage of PASTA over STRIDE is its collaborative, cross-functional nature — it brings together developers, architects, business analysts, risk professionals, and SOC team members in a way that STRIDE's developer-centric approach does not.

Limitation: PASTA is more complex to execute and requires a higher level of expertise. The accuracy of the methodology depends heavily on the availability and quality of data regarding the system and its architecture. It is not a "quick start" framework — expect a significant investment of time and stakeholder coordination.

Attack Trees

Attack Trees are a complementary technique that can be used alongside STRIDE, PASTA, or any other framework. Originally formalized by Bruce Schneier in 1999 (building on earlier work by Edward Amoroso and the NSA), Attack Trees provide a visual, hierarchical representation of how an attacker might achieve a specific goal.

The structure is simple:
  • The root node represents the attacker's goal (e.g., "Steal user credentials").
  • Child nodes represent the different ways to achieve that goal.
  • Nodes are connected using AND/OR logic: OR nodes represent alternatives (any one path suffices), AND nodes represent steps that must all be completed.
  • Each node can carry additional metadata: likelihood, cost, required skill level, detectability.
For example, an Attack Tree for "Bypass Authentication" might look like:

Bypass Authentication [ROOT - OR]
├── Brute Force Password [OR]
│   ├── Online brute force (if no rate limiting)
│   └── Offline brute force (if password hashes leaked)
├── Credential Stuffing [OR]
│   └── Use credentials from previous breaches
├── Session Hijacking [OR]
│   ├── Steal session cookie via XSS
│   └── Session fixation attack
├── Exploit Password Reset [OR]
│   ├── Predictable reset tokens
│   └── Account takeover via email compromise
└── SQL Injection on Login [OR]
    └── Bypass authentication via tautology (e.g. ' OR '1'='1)

Attack Trees are powerful because they go beyond graphical representation — they provide tactical insights that enable targeted defenses. They also serve as excellent communication tools for presenting security risks to leadership or non-technical stakeholders. Within PASTA specifically, Attack Trees are created during the fourth stage (Threat Analysis) to model how identified threat actors might achieve their goals.

Tip: Tools like OWASP Threat Dragon, Microsoft Threat Modeling Tool, IriusRisk, and Devici can help you build and maintain Attack Trees as living documents that evolve with your application.

LINDDUN (Privacy Threat Modeling)

LINDDUN is a privacy-focused threat modeling framework developed by researchers at KU Leuven. While STRIDE addresses security (confidentiality, integrity, availability), LINDDUN addresses privacy-specific concerns that traditional security frameworks overlook. With regulations like GDPR, CCPA, and HIPAA becoming stricter, privacy threat modeling is no longer optional for applications handling personal data.

LINDDUN stands for:
  • Linking — Can an adversary combine data to learn more about an individual?
  • Identifying — Can the identity of a data subject be determined?
  • Non-repudiation — Can a user be unable to deny an action (sometimes a privacy threat, not just a security feature)?
  • Detecting — Can an adversary detect that a user is using a system?
  • Data Disclosure — Can personal data leak to unauthorized parties?
  • Unawareness — Are users insufficiently informed about data collection and processing?
  • Non-compliance — Does the system fail to comply with privacy regulations and best practices?
LINDDUN follows the same four fundamental questions as the Threat Modeling Manifesto and can operate alongside STRIDE on a single DFD, enabling organizations to perform comprehensive privacy and security analysis without duplicating work. It comes in multiple flavors: LINDDUN GO (a gamified card-based approach for lean brainstorming sessions), LINDDUN PRO (a systematic, exhaustive approach starting from DFD analysis), and LINDDUN MAESTRO (an advanced approach with enriched system descriptions).

When to use it: If your application processes personal data — user profiles, health records, financial information, location data — you should be running LINDDUN alongside STRIDE. The cost of privacy violations (both regulatory fines and loss of user trust) now frequently exceeds the cost of traditional security breaches.

Choosing Your Framework

You do not have to pick only one framework. Many security teams begin with STRIDE to cover general threats, then layer in LINDDUN for privacy analysis. PASTA can be introduced later as the organization matures and seeks deeper insights into how threats connect to business objectives. Attack Trees can be used within any of these frameworks to drill deeper into specific attack scenarios. The most effective programs evolve from lightweight models into more integrated, cross-functional strategies.

The Threat Modeling Process

Regardless of which framework you choose, the threat modeling process can be decomposed into 3 high level steps:

Step 1: Decompose the Application:
  • Create use-cases to understand how the application is used.
  • Identify entry points (APIs, web forms, file uploads, message queues, webhooks).
  • Identify assets (databases, secrets, PII, session tokens, cryptographic keys).
  • Identify trust levels and trust boundaries between external entities.
  • Map data flows using DFDs or sequence diagrams.
Note: This stage has to do with understanding the context of the Web Application and its surrounding entities. In modern architectures, this includes microservices communication, API gateways, third-party integrations, cloud provider boundaries, and container orchestration layers.

The following images show the Business Architecture (Business Owner's Perspective) and Business Architecture Behavior of a Web Application:


Note: Lists the entities important to the business. Business entities can be a person, a thing or a concept that is part of or interacts with the business process (Proforma 2003). In the example of "XYZ-Match", the business entities include the following: Investors, Entrepreneurs, "XYZ-Match" web system.


Note: Lists the processes in which the business operates. In the example of "XYZ-Match", "Investor listing information to Venture Capital Directory" is one of such business processes.

Step 2: Determine and rank threats using your chosen categorization methodology:
  • Authentication and Identity Management
  • Authorization and Access Control
  • Session Management
  • Input Validation and Output Encoding
  • Data Protection in Storage and Transit (encryption at rest, TLS, key management)
  • Auditing, Logging, and Monitoring
  • Configuration Management and Secrets Management
  • Error Handling and Exception Management
  • Supply Chain and Dependency Security
Note: This stage has to do with mapping the vulnerabilities to a category. Threat listing is an important part of a Web Application code audit. Threat lists based on the STRIDE model are useful for identifying threats against attacker goals. Categorizing and grouping the Web Application threats will help to see which security controls have the majority of the problems — it is like a blinking led that says "Hey I have multiple problems, save me please I am a poor cod dying out, save me".

Step 3: Determine countermeasures and mitigation.

Note: Such countermeasures can be identified using threat-countermeasure mapping lists. The risk mitigation strategy might involve evaluating these threats from the business impact that they pose and reducing the risk.

The objective of risk management should be to reduce the impact that the exploitation of a threat can have to the application (not to necessarily mitigate the risk!). This can be done by responding to a threat with a risk mitigation strategy. In general there are five options to mitigate threats:
  1. Do nothing: for example, hoping for the best.
  2. Informing about the risk: for example, warning user population about the risk.
  3. Mitigate the risk: for example, by putting countermeasures in place.
  4. Accept the risk: for example, after evaluating the impact of the exploitation (business impact).
  5. Transfer the risk: for example, through contractual agreements and insurance.
The decision of which strategy is most appropriate depends on the impact an exploitation of a threat can have, the likelihood of its occurrence, and the costs for transferring or avoiding it.

  • Define the application requirements:
  1. Identify business objectives
  2. Identify user roles that will interact with the application
  3. Identify the data the application will manipulate
  4. Identify the use cases for operating on that data that the application will facilitate
  • Model the application architecture:
    • Model the components of the application
    • Model the service roles that the components will act under
    • Model any external dependencies (third-party APIs, open-source libraries, cloud services)
    • Model the calls from roles, to components and eventually to the data store for each use case
  • Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing
  • Assign risk values and determine the risk responses
  • Determine the countermeasures to implement based on your chosen risk responses
  • Continually update the threat model based on the emerging security landscape — threat modeling is not a one-time activity, it must evolve as the application, its dependencies, and the threat landscape change.

Modern Tools for Security Code Auditing

The tooling landscape for security code review has evolved dramatically. In the early days, tools like Graudit (a grep-based signature scanner), RATS, and findstr were the go-to options. While these still have educational value for understanding how pattern matching works, modern SAST (Static Application Security Testing) tools have moved far beyond simple regex-based detection.

Here is the current state of the art:

Semgrep

Semgrep is a lightweight, open-source static analysis tool that uses semantic pattern matching rather than simple text matching. This means it understands code structure — not just text patterns — enabling fast processing while maintaining accuracy. Developers can write custom rules in YAML that look like the code they want to find, making rule creation intuitive.

semgrep scan --config=auto /path/to/code

Key strengths: fast scanning (no compilation required), supports 20+ languages, integrates into CI/CD pipelines, IDE plugins, and PR checks. The commercial Semgrep AppSec Platform adds cross-file/cross-function dataflow analysis, SCA (Software Composition Analysis), secrets detection, and an AI-powered assistant for triage and autofix.

Note: In January 2025, after Semgrep changed its open-source licensing, 10+ competing vendors forked the community edition into "Opengrep." If you are evaluating Semgrep, be aware of this licensing shift and consider whether Opengrep-based alternatives (like Aikido Security) better fit your needs.

CodeQL (GitHub Advanced Security)

CodeQL is a semantic code analysis engine that compiles source code into a queryable relational database representing the AST, data flow graph, and control flow graph. Users write queries in QL (a Datalog-derived declarative language) to traverse this database. It is extremely powerful for deep, customized vulnerability hunting — CodeQL variant analysis has been used to identify over 400 CVEs in open-source projects.

codeql database create mydb --language=java --source-root=/path/to/code
codeql database analyze mydb codeql/java-queries:codeql-suites/java-security-and-quality.qls --format=sarif-latest --output=results.sarif

Key strengths: deep semantic analysis, powerful custom query language, native GitHub integration with Copilot Autofix for AI-powered fix suggestions. Free for open-source projects.

Limitation: Steep learning curve — requires specialized knowledge of QL. Requires code compilation to build the database, making it slower than Semgrep for quick scans. Best suited for security research and deep auditing rather than fast-moving DevSecOps workflows.

Other Notable Tools

  • Snyk Code — Developer-focused SAST with real-time IDE scanning and AI-trained detection engine. Strong on AI-generated code pattern detection.
  • SonarQube — Combines SAST with code quality checks. Good for teams that want security and maintainability in one platform.
  • Checkmarx / Veracode / OpenText (Fortify) — Enterprise-grade SAST platforms with deep scanning, compliance reporting, and legacy language support.
  • Bandit — Open-source Python-specific SAST tool. Lightweight and great for Python-heavy shops.
  • OWASP Dependency-Check / Trivy — SCA tools for scanning third-party dependencies for known vulnerabilities.
  • GitGuardian / Gitleaks — Secrets detection tools that scan repos for exposed credentials, API keys, and tokens.

Important: No single tool covers everything. The modern AppSec stack typically combines SAST + SCA + Secrets Detection + DAST + IaC scanning. The goal is layered defense — each tool catches what the others miss. And remember the golden rule: a tool is only as good as the human reviewing its output.

Cod(e) reviewing for SQL Injection

SQL Injection remains one of the most critical and prevalent vulnerabilities, consistently appearing in the OWASP Top 10. Despite decades of awareness, it persists because developers still make the same fundamental mistake: constructing SQL queries through string concatenation with untrusted input.

Use parameterized queries (PreparedStatements in Java) instead of dynamic SQL statements. Data validate all external input: ensure that all SQL statements recognize user inputs as variables, and that statements are precompiled before the actual inputs are substituted for the variables. A simplified way of thinking about SQL injection when talking about security code reviews would be to emphasize multiple layers of defense through the whole Web Application system. Input validation should occur at the Web Application input filter, the framework/ORM layer, and the database layer itself. Additional layers of defense can be added through a Web Application Firewall (WAF) and a Database Activity Monitor.

The following picture shows a sequence of yes and no flow chart explaining an SQL injection flow:


Note: This is a simplified SQL Injection threat model. In practice, the decision tree branches further when you consider second-order injection, blind SQLi, and out-of-band channels.

The Vulnerable Code (Java — what NOT to do)

// VULNERABLE: SQL Injection via string concatenation
// This is the classic mistake — user input directly embedded in SQL

String username = request.getParameter("USER");       // From HTTP request — UNTRUSTED
String password = request.getParameter("PASSWORD");   // From HTTP request — UNTRUSTED

// DANGER: Direct concatenation of user input into SQL query
String sql = "SELECT User_id, Username FROM USERS WHERE Username = '"
    + username + "' AND Password = '" + password + "'";

Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);

// An attacker entering: Username = ' OR '1'='1' --
// Produces: SELECT User_id, Username FROM USERS
//           WHERE Username = '' OR '1'='1' --' AND Password = ''
// Result: Authentication bypass — returns all users

When SQL statements are dynamically created as software executes, there is an opportunity for a security breach as the input data can truncate, malform, or expand the original SQL query. The request.getParameter() retrieves the data for the SQL query directly from the HTTP request without any data validation (min/max length, permitted characters, malicious characters). This error gives rise to the ability to input SQL as the payload and alter the functionality of the statement.

The Secure Code (Java — PreparedStatement / Parameterized Query)

// SECURE: Parameterized query using PreparedStatement
// The SQL structure is precompiled; user input is ALWAYS treated as data

String username = request.getParameter("USER");
String password = request.getParameter("PASSWORD");

// The '?' placeholders ensure input can never alter the query structure
String sql = "SELECT User_id, Username FROM USERS WHERE Username = ? AND Password = ?";

try (PreparedStatement pstmt = connection.prepareStatement(sql)) {

    pstmt.setString(1, username);  // Bound as data, not SQL code
    pstmt.setString(2, password);  // Bound as data, not SQL code

    try (ResultSet rs = pstmt.executeQuery()) {
        if (rs.next()) {
            int userId = rs.getInt("User_id");
            String loggedUser = rs.getString("Username");
            // Authentication successful
        } else {
            // Authentication failed
        }
    }
} catch (SQLException e) {
    logger.error("Database error during authentication", e);
    // NEVER expose stack traces or SQL errors to the user
}

The PreparedStatement precompiles the SQL query with placeholder markers (?). When setString() is called, the JDBC driver ensures the input is treated strictly as a string literal — it can never be interpreted as SQL code. Even if an attacker enters ' OR '1'='1, the database will literally search for a user with that exact string as their username, which will return nothing.

Modern Alternative: Using an ORM (JPA/Hibernate)

In modern Java applications, you often interact with the database through an ORM rather than raw JDBC. Here is how the same query looks using JPA (Java Persistence API):

// SECURE: JPA Named Query — parameterized by default

@Entity
@NamedQuery(
    name = "User.findByCredentials",
    query = "SELECT u FROM User u WHERE u.username = :username AND u.password = :password"
)
public class User { ... }

// Usage:
TypedQuery<User> query = entityManager.createNamedQuery("User.findByCredentials", User.class);
query.setParameter("username", request.getParameter("USER"));
query.setParameter("password", request.getParameter("PASSWORD"));
List<User> results = query.getResultList();

Warning: ORMs are not automatically safe. If you use string concatenation to build JPQL or HQL queries, you are just as vulnerable. The rule is the same: always use parameterized queries regardless of the abstraction layer.

Python Example (for comparison)

# VULNERABLE — string formatting with untrusted input
cursor.execute(f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}'")

# SECURE — parameterized query
cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username, password))

What to grep for during code review (SQL Injection indicators)

When performing a manual code review or writing custom SAST rules, look for these patterns:
  • Statement.execute( or Statement.executeQuery( combined with string concatenation (+)
  • "SELECT ... " + variable or "INSERT ... " + variable or "UPDATE ... " + variable
  • String.format() used to build SQL queries
  • f"SELECT ..." (Python f-strings in SQL context)
  • cursor.execute("... %s ..." % variable) (Python old-style string formatting — NOT the same as parameterized %s)
  • $"SELECT ..." (C# string interpolation in SQL context)
A Semgrep rule to detect Java SQL injection looks like this:

# semgrep-rule: java-sql-injection.yaml
rules:
  - id: java-sqli-string-concat
    patterns:
      - pattern: |
          String $QUERY = "..." + $INPUT + "...";
          ...
          $STMT.executeQuery($QUERY);
    message: >
      Potential SQL injection: user input concatenated into SQL query.
      Use PreparedStatement with parameterized queries instead.
    severity: ERROR
    languages: [java]

Epilogue

Educating developers to write secure code is the paramount goal of a secure code review. Taking code review from this standpoint is the only way to promote and improve code quality. Part of the education process is to empower developers with the knowledge in order to write better code. This can be done by providing developers with a controlled set of rules which the developer can compare their code to. Modern SAST tools like Semgrep and CodeQL embody this philosophy — they provide immediate, contextual feedback in the developer's IDE and PR workflow, turning every code review into a learning opportunity.

The landscape has changed dramatically since the days of grep-based scanning. We now have AI-assisted remediation, cross-function taint analysis, reachability-based SCA, and privacy-specific threat modeling frameworks. But the fundamental truth remains: tools augment human judgment, they do not replace it. The best security code review is one where a knowledgeable reviewer uses the right tools to focus their attention on what matters, validates findings in context, and communicates risk in business terms that drive action.

Threat model your applications. Use STRIDE for security, LINDDUN for privacy, PASTA when you need to tie threats to business impact, and Attack Trees to drill into specific attack scenarios. Embed SAST in your pipeline. Review every finding with human eyes. And never, ever concatenate user input into a SQL query.

References:

03/11/2012

Crypto for pentesters


Introduction

The purpose of this paper is to emphasize in the importance of cryptography, focus in RSA asymmetric cryptographic algorithm and explain:

  • What is cryptography
  • Why cryptography is important
  • History of Cryptography
  • Mathematical RSA operations
  • How to perform an RSA brute-force

What is Cryptography

Cryptography (or cryptology; from Greek κρυπτός, kryptos, "hidden, secret"; and γράφω, gráphō, "I write", or -λογία, -logia, respectively) is the practice and study of hiding information. Modern cryptography intersects the disciplines of mathematics, computer science, and engineering. Applications of cryptography include ATM cards, computer passwords, and electronic commerce. [2]

Until recently cryptography referred mostly to encryption, which is the process of converting ordinary information (plaintext) into unintelligible gibberish (i.e. cipher-text). [4] Decryption is the reverse, in other words, moving from unreadable cipher-text back to plaintext. A cipher is a pair of algorithms that create the encryption and the reversing, also called decryption. [4] The operation of an algorithmic cipher is controlled by both the algorithm and in each instance by a key. The key is a secret parameter (ideally known only to the communicants) for a specific message exchange context. [4]

In order for two parties to exchange a cryptographic message both must have one or two secret keys (it depends if the parties use an asymmetric or a symmetric algorithm) and a known mathematical cryptographic algorithm (both parties must know the details of the cryptographic algorithm) the following diagram shows the process. 



Picture2: Simple encryption decryption operation (if the cryptography used is symmetric then key1=key2) [3]

Note: Secret keys are very important, as ciphers without variable size keys can be trivially broken with only the knowledge of the cipher used and are therefore not useful at all in most cases.

History of Cryptography

Before the modern era, cryptography was concerned solely with message confidentiality (i.e., encryption) — conversion of messages from a comprehensible form into an incomprehensible one and back again at the other end, rendering it unreadable by interceptors or eavesdroppers without secret knowledge (namely the key needed for decryption of that message). Encryption was used to (attempt to) ensure secrecy in communications, such as those of spies, military leaders, and diplomats. [6]

The earliest forms of secret writing required little more than local pen and paper analogy, as most people could not read. More literacy, or literate opponents, required actual cryptography. The main classical cipher types are transposition ciphers, which rearrange the order of letters in a message (e.g., 'hello world' becomes 'ehlol owrdl' in a trivially simple rearrangement scheme), and substitution ciphers, which systematically replace letters or groups of letters with other letters or groups of letters (e.g., 'fly at once' becomes 'gmz bu podf' by replacing each letter with the one following it in the Latin alphabet). [6]

Asymmetric and Symmetric Cryptography  

Cryptography consists from two main categories (some people might claim more categories but for the papooses of this paper two categories cover the needs of this paper). The fist category is symmetric cryptography and the second category is asymmetric cryptography.

Symmetric Cryptography

Symmetric Cryptography uses cryptographic algorithms that use identical cryptographic keys for both decryption and encryption (this is not entirely true, but again for the purposes of this paper we accept it as a fact).The Symmetric Cryptography keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. [5] The following simple mathematical relationships can describe the relation between decryption and encryption in Symmetric Cryptography.

Encryption ( k , Plaintext ) = Cipher (1) 

Decryption ( k , Cipher ) = Plaintext (2) 

Where k a secret shared value and Plaintext the data input we want to convert into cipher. From relationships (1) and (2) we can conclude that: 

Plaintext = Decryption ( k , Cipher ) = Decryption ( k , Encryption ( k , Plaintext )) (3) 

Note: Among the most popular and well-respected symmetric algorithms ARE Twofish, Serpent, AES (Rijndael), Blowfish, CAST5, RC4, TDES, and IDEA.





Picture3: Symmetric cryptography [7]

Asymmetric Cryptography


Asymmetric Cryptography also known as Public key cryptography is a cryptographic category which involves the use of asymmetric cryptographic key algorithms.The asymmetric key algorithms are used to create a mathematically related key pair: a secret private key and a published public key. [6]

The following simple mathematical relationships can describe the relation between decryption and encryption in Asymmetric Cryptography.

Encryption ( k1 , Plaintext ) =  Cipher (4)

Decryption ( k2 , Cipher ) =  Plaintext (5)

Where k2 a secret non shared value, k1 a non secret shared value and Plaintext the data input we want to convert into cipher. From relationships (4) and (5) we can conclude that:

Plaintext = Decryption ( k2 , Cipher ) = Decryption ( k2 , Encryption ( k1 , Plaintext )) (6)

In relationship we can realize that the decryption of the Plaintext is a more complex relationship that is dependents to both keys to be used.Public key cryptography is used widely. It is the approach which is employed by most cryptosystems. It underlies such Internet standards as Transport Layer Security (TLS), PGP, and GPG.The most famous asymmetric algorithm is RSA. In cryptography, RSA (which stands for Rivest, Shamir and Adleman who first publicly described it) is an algorithm for public-key cryptography. RSA is widely used in electronic commerce, and is believed to be secure when proper secret keys are used.






Picture4: Asymmetric cryptography [7]


Asymmetric and Symmetric Cryptography revised 

Unlike Symmetric Cryptography, Asymmetric Cryptography uses a different key for encryption than for decryption. I.e., a user knowing the encryption key of an asymmetric algorithm can encrypt messages, but cannot derive the decryption key and cannot decrypt messages encrypted with that key. This difference is the most obvious difference of Symmetric and Asymmetric Cryptography. Another difference of Symmetric and Asymmetric Cryptography is the mathematical properties of each type of algorithm and they way the mathematical algorithm is implemented in a hardware or software device (further explanation of these differences are out of the scope of this paper).

Why RSA is important

The RSA Cryptographic algorithm is being exploited by a company named also RSA. The company RSA is the security division of EMC (EMC acquired RSA for its security products in 2006). RSA Laboratories is the research center of RSA, The Security Division of EMC, and the security research group within the EMC Innovation Network. The group was established in 1991 at RSA Data Security, the company founded by the inventors of the RSA public-key cryptosystem. Through its applied research program and academic connections, RSA Laboratories provides state-of-the-art expertise in cryptography and data security for the benefit of RSA and EMC. [10]

Represented by the equation "c = me mod n," the RSA algorithm is widely considered the standard for encryption and the core technology that secures the vast majority of the e-business conducted on the Internet. The U.S. patent for the RSA algorithm (# 4,405,829, "Cryptographic Communications System And Method") was issued to the Massachusetts Institute of Technology (MIT) on September 20, 1983, licensed exclusively to RSA Security and expires on September 20, 2000. [9]

For nearly two decades, more than 800 companies spanning a range of global industries have turned to RSA Security as a trusted, strategic partner that can provide the proven, time-tested encryption implementations and resources designed to speed time to market. These companies, including nearly 200 so far in 2000, rely on RSA BSAFE® security software for its encryption implementation and value-added services for a broad range of B2B, B2C and wireless applications. [9]

Math’s behind RSA

The RSA algorithm involves the three following steps:
  1. Key generation.  
  2. Encryption. 
  3. Decryption.

Note: This is a simplistic approach of RSA.

RSA Key generation

RSA includes a public key and a private key. The public key can be known to everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted using the private key. The keys are generated the following way:


1.     Choose two distinct prime numbers p and q. In mathematics, a prime number (or a prime) is a natural number that has exactly two distinct natural numbers 1 and itself. That means that a prime number can only be divided by 1 and itself: [13]

Example 1:  5/5 = 1

Example 2: 5/1 = 5

Very simplistically talking, it means that the remainder of the division of a prime number with any integer besides 1 and itself should be 0!!

Example 3: 5/2 = 2, 5 (2, 5 is not an integer)

The first fifteen prime numbers are:


Example 4: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47

Note: For security purposes, the integer’s p and q should be chosen uniformly at random and should be of similar bit-length. [8]



2.     Compute n = pq (a), where n is used as the modulus for both the public and private keys. For the purposes of this paper we will not use long secure integers. (as soon as we have the values n and φ, the values p and q will no longer be useful to us).

Note: For the algebra to work properly, these two primes must not be equal. To make the cipher strong, these prime numbers should be large, and they should be in the form of arbitrary precision integers with a size of at least 1024 bits (bits are used when cryptography is applied in real life examples).

Example 3:  (a) n = pq (b) which means that if p = 2 and q = 3 then n = 2*3 = 6 (both 2 and 3 are prime based in Example 4).



Now that we have the values n and φ, the values p and q will no longer be useful to us. However, we must ensure that nobody else will ever be able to discover these values. Destroy them, leaving no trace behind so that they cannot be used against us in the future. Otherwise, it will be very easy for an attacker to reconstruct our key pair and decipher our cipher text.

3.     Compute φ(pq) = (p − 1)(q − 1) (c) or φ(n) = (p − 1)(q − 1). (φ is Euler's totient function [11], Euler's totient function is out of the scope of this paper).

4.     Choose an integer e such that:

·      1 < e < φ (pq) or
·      1 < e < φ (n) or
·      1 < e < (p − 1) (q − 1) (d)

And e and φ (pq) have no common divisors other than 1. We randomly select a number e (the letter e is used because we will use this value during encryption) that is greater than 1, less than φ, and relatively prime to φ. Two numbers are said to be relatively prime if they have no prime factors in common. Note that e does not necessarily have to be prime.  

The value of e is used along with the value n to represent the public key used for encryption.

·      e is released as the public key exponent

5.     Determine d (using modular arithmetic) which satisfies the relation:




This is often computed using the extended Euclidean algorithm [12] (Euclidean algorithm is out of the scope of this paper).

·      d is kept as the private key exponent.

Note: To calculate the unique value d (to be used during decryption) that satisfies the requirement that, if d * e is divided by φ, then the remainder of the division is 1. The mathematical notation for this (as already described above) is d * e = 1(mod φ).

In mathematical jargon, we say that d is the multiplicative inverse of e modulo φ [15]. The value of d is to be kept secret. If you know the value of φ, the value of d can be easily obtained from e using a technique known as the Euclidean algorithm. If you know n (which is public), but not p or q (which have been destroyed), then the value of φ is very hard to determine. The secret value of d together with the value n represents the private key. The public key consists of the modulus n and the public (or encryption) exponent e. The private key consists of the modulus n and the private (or decryption) exponent d which must be kept secret.

RSA Encryption


RSA Cryptographic User A transmits his public key (n,e) to RSA Cryptographic User B and keeps his private key secret (d,e). RSA Cryptographic User B then wishes to send a secret integer m to RSA Cryptographic User A. He first turns m into an integer 0 < m < n by using an agreed-upon reversible procedure (only known to Users A and B). He then computes the cipher text c corresponding to: [9]




Note: And the encryption is successful. User A is the only person that can decrypt the secret integer m.  


RSA Decryption

RSA Cryptographic User A can recover m from c by using her private key exponent d by the following computation:





Note: Given m, he can recover the original message m by using the agreed-upon reversible procedure.


RSA Encryption/Decryption Simple example

For the purposes of this paper we are going to use a very simple number example. Based on Example 4 we have to:



1.     Choose q = 47 and q = 73. Based in mathematical relationship (b) is n = pq => n = 3431 also from relationship (c) we conclude that:

(c) =>  φ(pq) = (p − 1)(q − 1) or φ(n) = φ(pq) = φ(6) = (73-1)(47-1) = 72*46 = 3312 =>  φ = 3312


2.     Now that we have n and φ, we should discard p and q, and destroy any trace of their existence.

3.     Next, we randomly select a number e, that e > 1 and e is coprime [16] to 3312 (which is φ)We choose e = 425

4.     Then the modular inverse of e is calculated to be the following: d = 1769

5.     We now keep d private and make e and n public.

6.     Assume that we have plaintext data represented by the following simple number:

Plaintext = 707

7.     The encrypted data is computed by c = me (mod n) as follows:

Cipher text = 707^425(mod 3431) = 2142

8.     The cipher text value cannot be easily reverted back to the original plaintext without knowing d (or, equivalently, knowing the values of p and q). With larger bit sizes, this task grows exponentially in difficulty. If, however, you are privy to the secret information that d = 1769, then the plaintext is easily retrieved using     m = c d(mod n) as follows:

Plaintext = 2142^1769(mod 3431) = 707


Why RSA can’t break

The security of the RSA cryptosystem is based on two mathematical problems:

  1. The problem of factoring large numbers
  2. The RSA problem.
In number theory, integer factorization or prime factorization is the breaking down of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer. [17] In cryptography, the RSA problem summarizes the task of performing an RSA private-key operation given only the public key. [18] Full decryption of an RSA cipher text is thought to be infeasible on the assumption that both of these problems are hard because no efficient algorithm exists for solving them.


Appendix

C

Cryptography: Is the process of converting ordinary information (plaintext) into unintelligible gibberish.



Cipher: Is unintelligible gibberish



Coprime: In mathematics, two integers a and b are said to be coprime or relatively prime if they have no common positive factor other than 1 or, equivalently, if their greatest common divisor is 1. [16]

D



Decryption: Is the reverse process of encryption

M

Multiplicative inverse: In mathematics, a multiplicative inverse or reciprocal for a number x, denoted by 1x or x −1, is a number which when multiplied by x yields the multiplicative identity, 1. [15]

P

Plaintext: Is ordinary readable information

Problem of factoring large numbers: In number theory, integer factorization or prime factorization is the breaking down of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer. [17]

Prime numbers: In mathematics, a prime number (or a prime) is a natural number that has exactly two distinct natural number divisors: 1 and itself. [13]


References

[2]: Liddell and Scott's Greek-English Lexicon. Oxford University Press. (1984)



Pattern-based policy as code: governance that holds the gate

// appsec · infrastructure · policy as code Pattern-based policy as code: governance that holds the gate Most organizations al...