31/05/2026

Pattern-based policy as code: governance that holds the gate

// appsec · infrastructure · policy as code

Pattern-based policy as code: governance that holds the gate

Most organizations already know what good infrastructure looks like. There is a wiki page, a control matrix, maybe a signed-off standard with a version number on it. The problem is never the document. The problem is the distance between that document and the thing your pipeline actually checks before a change ships.

That distance is where the incidents live. A workload lands in a region nobody approved for that data class. A security group quietly opens ingress wider than anyone intended. A required owner tag is missing, so when the bill spikes nobody knows whose resource it is. Encryption is assumed and never configured. None of these are exotic. They are the boring, repeatable gaps that manual review keeps missing because manual review does not scale to the rate at which teams provision today.

Policy as code closes the distance by turning control intent into preventive checks that run inside the delivery workflow. The part people get wrong is how they organize those checks. Build rules one service at a time and you end up with a library that nobody can read, where the same control is expressed five different ways across five repositories. The fix is to stop organizing by service and start organizing by pattern.

Why per-service rules rot

The first ten rules are easy. You write a check for object storage, a check for your security groups, a check for your databases, and it feels productive. Then the library grows. A new service shows up, someone copies the nearest existing rule, tweaks it, and now you have two near-identical checks that drift apart over the next six months. A compliance engineer asks "where do we enforce encryption" and the honest answer is "in about nine places, inconsistently."

The core issue is that service-shaped rules do not map to how humans reason about controls. A GRC reviewer thinks in terms of intent. Is sensitive data protected. Is access constrained. Is exposure limited. They do not think in terms of aws_s3_bucket_policy resource blocks. When your policy library is shaped like the cloud provider's API instead of like your control intent, every conversation between security and engineering needs a translation layer, and translation layers leak.

The five patterns that cover most of it

Reorganize the library around recurring control intent and the picture gets sharper. A small set of patterns covers the large majority of what a pre-deployment gate needs to enforce. These are the ones I keep coming back to.

  • Required metadata. Tags and ownership fields used for support routing, cost allocation, data classification, and automation. If a resource cannot be traced back to a team, it should not deploy.
  • Allowed configuration. Approved regions, accepted instance classes, sanctioned deployment boundaries. The set of settings you have decided are inside the lines.
  • Exposure restriction. Anything that makes a resource more reachable than intended. Public ingress, internet-facing endpoints in the wrong environment, overly permissive network paths.
  • Protection enforcement. Baseline safeguards that should never be optional. Encryption at rest and in transit, logging, deletion protection.
  • Privilege constraint. Identity and trust definitions that need tighter validation. Wildcard principals, broad assume-role trust, permissions that grant more than the workload needs.
The five control patterns the policy library is organized around
Fig 1. Recurring control patterns the policy library is organized around

The win is shared language. A compliance engineer calls it mandatory metadata, a platform engineer calls it the tagging standard, and the pattern name lets them point at the same rule without arguing about vocabulary. Coverage becomes something you can reason about at a glance, because every rule belongs to a named intent rather than floating as a one-off.

Where the preventive layer actually sits

Be honest about scope. A pre-deployment policy gate is one layer, not the whole stack. Open Policy Agent evaluates a proposed change before it exists. It reads a Terraform plan rendered to JSON and decides whether the change matches your patterns. That is the preventive layer, and it is the cheapest place to catch a mistake because nothing has been created yet.

It does not replace runtime governance. After resources exist you still need drift detection, continuous configuration monitoring, organization-level guardrails, and findings aggregation. Whatever your cloud provider offers for that, and whatever you bolt on with tools like Checkov, cfn-guard, or admission controllers such as Kyverno, those keep doing their job after deploy. OPA in the pipeline is the thing that stops the bad change from becoming a runtime problem in the first place. The two layers are complementary and you want both.

The mental model: organization guardrails set the outer fence, the pipeline gate inspects every change against your patterns before merge, and post-deploy monitoring watches what already exists. A miss at one layer should be caught at another. Defense in depth applies to governance, not just to attacks.
Where the pre-deployment gate sits in a layered governance model
Fig 2. Where the pre-deployment gate sits in a layered governance model

Wiring it into the pipeline

The sequence is the same regardless of which CI system you run. Validate the code, generate a plan, render the plan to JSON, evaluate it against the shared policy library, keep the result as an artifact, and let that result feed both the automated gate and any human approval for higher-risk environments.

Policy evaluation inside a gated delivery workflow
Fig 3. Policy evaluation inside a gated delivery workflow

Keep two gate types distinct in your head. A quality gate is automated pass or fail against defined criteria. An approval gate controls promotion into a protected environment. The mistake is making a human the first line that notices a missing tag or a disallowed region. By the time a change reaches manual approval, the machine should already have flagged the mechanical failures. Humans are expensive and inconsistent at spotting a wildcard in a JSON blob. OPA is not. Put OPA in the automated layer and let its output inform the approval, never the other way around.

Here is a GitHub Actions job that runs the gate on every pull request. The shape ports cleanly to GitLab CI or anything else. Only the surrounding syntax changes.

name: iac-policy-gate
on: [pull_request]

jobs:
  policy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init && terraform validate
      - run: terraform plan -out tfplan
      - run: terraform show -json tfplan > plan.json
      - name: evaluate policy
        run: |
          opa eval --format pretty --data policies \
            --input plan.json "data.patterns"
          opa eval --format json --data policies \
            --input plan.json "data.patterns" > policy-report.json
      - uses: actions/upload-artifact@v4
        with:
          name: policy-report
          path: policy-report.json

Structure the library by intent

Lay the directory out the way teams talk about controls, not the way the provider names resources. Pattern directories on top, shared helpers and message text factored out, tests and realistic fixtures alongside.

policies/
├── patterns/
│   ├── metadata/      # required tags, ownership
│   ├── configuration/ # approved regions, classes
│   ├── exposure/      # ingress, public reachability
│   ├── protection/    # encryption, logging
│   └── privilege/     # trust, identity constraints
├── shared/
│   ├── helpers.rego
│   └── messages.rego
├── fixtures/
└── tests/

A few worked examples follow. Each one targets a single pattern, returns a deny with a stable identifier in the message, and reads against Terraform plan JSON. The resource types are provider-specific by necessity, but the pattern they encode is portable. Adapt the type names to your stack and the logic carries over.

Protection enforcement: refuse plaintext transport

The goal is to make sure object storage rejects requests that arrive over an unencrypted channel. The check has two halves. A bucket policy must carry an explicit deny for non-TLS requests, and a bucket created with no policy at all is itself a failure. An explicit deny is the stronger construction because it overrides any permissive statement that might exist elsewhere in the set.

package patterns.protection.transit

import future.keywords.contains
import future.keywords.if
import future.keywords.in

deny contains msg if {
    change := input.resource_changes[_]
    change.type == "aws_s3_bucket"
    mutating(change.change.actions)
    not bucket_has_tls_deny(change.change.after.bucket)

    msg := sprintf(
        "PROT-TRANSIT-01: bucket %q has no policy denying plaintext transport",
        [change.address],
    )
}

mutating(actions) if actions[_] == "create"
mutating(actions) if actions[_] == "update"

bucket_has_tls_deny(name) if {
    pol := input.resource_changes[_]
    pol.type == "aws_s3_bucket_policy"
    mutating(pol.change.actions)
    pol.change.after.bucket == name

    doc := json.unmarshal(pol.change.after.policy)
    stmt := doc.Statement[_]
    stmt.Effect == "Deny"
    stmt.Condition.Bool["aws:SecureTransport"] == "false"
}

When you adapt this, decide up front whether you require one exact policy shape or accept several equivalent forms. A strict rule is easier to reason about and easier to defend in an audit. A loose rule produces fewer false positives when teams already enforce the same outcome through a different structure. Pick the trade deliberately rather than discovering it through a flood of failed pipelines.

Exposure restriction: no public ingress on sensitive ports

This catches security group rules that open management and datastore ports to the whole internet before they deploy. A network rule is a direct statement of intended reachability, which makes it exactly the right place to enforce early. Do not try to reason about whether some downstream control might narrow the blast radius later. If the rule says 0.0.0.0/0 on a sensitive port, that is the intent on the page, and the gate should reject it.

package patterns.exposure.ingress

import future.keywords.contains
import future.keywords.if
import future.keywords.in

watched_ports := {22, 3389, 5432, 6379, 9200}
open_cidr := "0.0.0.0/0"

deny contains msg if {
    change := input.resource_changes[_]
    change.type == "aws_security_group"
    mutating(change.change.actions)

    rule := change.change.after.ingress[_]
    rule.cidr_blocks[_] == open_cidr

    port := watched_ports[_]
    rule.from_port <= port
    rule.to_port >= port

    msg := sprintf(
        "EXP-INGRESS-01: %q exposes port %d to the public internet",
        [change.address, port],
    )
}

mutating(actions) if actions[_] == "create"
mutating(actions) if actions[_] == "update"

Repositories model security groups two ways, with inline ingress blocks and with standalone rule resources, so a complete check evaluates both shapes. When you adapt this, settle which ports count as sensitive in your environment, whether IPv6 exposure needs the same treatment, and how approved exceptions get recorded so a legitimate bastion does not trip the gate every run.

Privilege constraint: no wildcard trust

This inspects role trust documents for principals that allow a broader set of callers than the environment intends. A wildcard principal is the prohibited pattern, and it hides in three forms: a bare "*", an AWS field set to "*", and an AWS array that contains "*" among other entries. Treating the wildcard as the thing to reject gives reviewers a result they can read in one line.

package patterns.privilege.trust

import future.keywords.contains
import future.keywords.if
import future.keywords.in

deny contains msg if {
    change := input.resource_changes[_]
    change.type == "aws_iam_role"
    mutating(change.change.actions)

    doc := json.unmarshal(change.change.after.assume_role_policy)
    stmt := doc.Statement[_]
    stmt.Effect == "Allow"
    wildcard_principal(stmt)

    msg := sprintf(
        "PRIV-TRUST-01: %q trusts a wildcard principal; name explicit accounts or service principals",
        [change.address],
    )
}

wildcard_principal(stmt) if stmt.Principal == "*"
wildcard_principal(stmt) if stmt.Principal.AWS == "*"
wildcard_principal(stmt) if stmt.Principal.AWS[_] == "*"

mutating(actions) if actions[_] == "create"
mutating(actions) if actions[_] == "update"

The real design question is what least privilege means for your trust model. The simple version rejects a single prohibited pattern, which is what the rule above does. The stronger version validates trust against an allowlist of approved principals and conditions, which is more work to maintain but far harder to quietly subvert. Start with the prohibition and grow toward the allowlist as the library matures.

Keep the evidence

Policy results that vanish into pipeline logs are worthless three weeks later when an auditor asks why a change shipped. In a mature workflow the evaluation output is retained as a validation artifact attached to the change record. That artifact tells a reviewer whether the change is ready, shows exactly which controls failed and why during exception handling, and stays with the change for later audit conversations. At minimum it should identify the pipeline run, the scope evaluated, the policy package and version, the checks that ran, and the verdict for each. The JSON output from opa eval is already most of this. Upload it, version it, and stop reconstructing history from log scrollback.

Test the policy like it is software, because it is

Policy as code is code, and untested code that gates production is a liability with extra steps. The first few rules feel trivial to verify by eye. The real work begins once the library is large and several teams depend on shared helpers. Treat it accordingly. Every policy needs cases that pass on valid input and cases that fail on the input you expect to reject. Shared helpers need their own regression coverage so a change in one place does not silently break ten rules. Fixtures should look like real Terraform plans, not toy snippets, because the gap between a tidy fixture and a messy real plan is where false negatives breed.

The stakes are trust. The moment developers stop believing the results, whether from noisy false positives or from a miss that should have been caught, they start routing around the gate, and a gate that gets bypassed is decoration. Accuracy is not a nicety here. It is the whole reason the mechanism survives.

Roll it out in phases, not all at once

You do not need broad coverage on day one, and reaching for it is how rollouts die. Start in advisory mode so teams see results without being blocked. Pick two or three high-confidence patterns to begin with, usually required metadata, approved regions, and public exposure, because those are unambiguous and the failures are obvious. Run the policies against existing pipelines and read the output for accuracy before anything blocks.

Once the output is stable and the failures are genuinely useful, switch that small set to enforcement and wire the validation artifact into the approval flow. Establish who owns the shared packages and how exceptions get requested and recorded. Only then expand. Formalize versioning for the shared library, grow pattern coverage based on what teams actually hit, and connect the pre-deployment gate to your post-deploy monitoring so the two layers reinforce each other instead of duplicating effort.

The point

Policy as code narrows the distance between what your organization says it expects and what its delivery system verifies. Pattern-based structure is what keeps that library legible as it grows, because it maps to control intent instead of provider APIs. Add retained artifacts and clear ownership and the whole thing becomes a repeatable way to turn a control document into something the pipeline enforces on every change. The runtime layer still matters. It just should not be the first place anyone finds out the rules were never followed.


// elusive thoughts · adapted and generalized from a pattern-based policy-as-code writeup · examples are illustrative, test before you enforce

Pattern-based policy as code: governance that holds the gate

// appsec · infrastructure · policy as code Pattern-based policy as code: governance that holds the gate Most organizations al...