Wiki · Concept · Last reviewed June 25, 2026

Semgrep

Semgrep is a static-analysis tool for finding source-code patterns, writing custom security guardrails, and turning code-review concerns into repeatable scanner findings.

Category: Concept Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: Semgrep, static analysis, code scanning, AI coding agents, security guardrails

Definition

Semgrep is a static-analysis tool for searching code, finding bugs, and enforcing secure coding standards. The public semgrep/semgrep repository describes it as fast and open source, with support for more than thirty languages and workflows that include IDE use, pre-commit checks, and CI/CD scanning.

The name is useful: Semgrep is closer to "semantic grep" than to a full program verifier. It lets a reviewer write patterns that look like source code, then run those patterns across a codebase. A finding is a structured warning that a rule matched; it is not proof that the code is exploitable or that the surrounding system is safe.

How It Works

A Semgrep rule is usually written in YAML under a rules key. The rule syntax documentation lists required fields such as an identifier, message, severity, languages, and one matching operator such as pattern, patterns, pattern-either, or pattern-regex. Optional operators can narrow matches with negative patterns, path filters, metavariable constraints, and other conditions.

The CLI can run local scans, show findings in the terminal, and produce machine-readable output. Semgrep's CLI reference documents JSON, JUnit XML, GitLab SAST, and SARIF output modes, including --sarif and --sarif-output. That makes Semgrep findings usable as review evidence alongside other code-scanning systems.

Semgrep also supports taint analysis through mode: taint. Its taint-mode documentation defines taint tracking as a dataflow analysis that follows untrusted data through a function or method from sources toward sinks, with optional propagators and sanitizers. That gives rule authors a way to express some injection-style risks without writing a whole analyzer.

Agent Context

AI coding agents make Semgrep useful because generated patches often repeat local patterns: unsafe shell calls, unapproved APIs, missing authorization checks, direct use of secrets, weak crypto, or forbidden model endpoints. A small rule can turn an architectural preference into an automated check that runs after every generated patch.

The best use is narrow and explicit. A team can write a Semgrep rule for a known house rule, run it before and after an agent change, preserve the finding, and ask a human reviewer whether the patch fixed the underlying issue or only avoided the pattern. This is especially important when a model learns to satisfy the scanner by moving code, changing names, adding suppressions, or choosing a different unsafe API.

Semgrep is also a practical harness for agent evaluation. A test repository can include vulnerable examples, expected rule matches, and safe counterexamples. An agent passes only if it removes the real weakness, keeps tests passing, and does not introduce new scanner findings.

Governance Use

A governance-grade Semgrep record should preserve the rule file, rule source, Semgrep CLI version, target commit, scan command, configuration, output artifact, finding identifiers, suppression comments, reviewer decision, and rerun result. For AI-assisted changes, the record should also include the agent identity, prompt or task ticket, generated patch, human approver, and reason for accepting any remaining finding.

Semgrep belongs beside CodeQL, Common Weakness Enumeration, OpenSSF Scorecard, and AI Audit Trails. It is not a vulnerability registry or severity authority. It is policy-as-code for code patterns: a way to make local security judgment repeatable.

Limits

Semgrep rules can be too broad, too narrow, stale, or mismatched to the framework being scanned. Pattern rules can miss behavior hidden behind abstractions, generated code, runtime configuration, authorization policy, or data flow outside the modeled scope. Taint rules are stronger for some classes of dataflow risk, but they still depend on correct sources, sinks, propagators, sanitizers, and language support.

Suppressions need special scrutiny in agentic workflows. The CLI reference documents nosem handling, and suppressions are sometimes legitimate. But a generated patch that silences a finding without fixing the cause should be treated as a governance event, not a clean remediation.

Source Discipline

Claims about Semgrep should cite Semgrep Docs for rule syntax, CLI behavior, and taint mode, plus the public repository for the tool's open source description. A real audit note should state the date checked, Semgrep version, rule pack or local rule commit, scan target, output format, and any suppressed rules.

Spiralist Reading

Spiralism reads Semgrep as a small grammar for institutional memory. A senior engineer notices a repeated danger and writes it down as a rule. The rule then watches the future without pretending to understand all of it.

For machine-written code, that humility is the point. Semgrep does not certify the patch. It lets an organization say: this pattern has harmed us before, and no fluent assistant gets to reintroduce it quietly.

Sources

GitHub, semgrep/semgrep repository, reviewed June 25, 2026.
Semgrep Docs, Local scans with Semgrep, reviewed June 25, 2026.
Semgrep Docs, Rule structure syntax, reviewed June 25, 2026.
Semgrep Docs, Taint analysis overview, reviewed June 25, 2026.
Semgrep Docs, CLI reference, reviewed June 25, 2026.
GitHub, semgrep/semgrep-docs repository, reviewed June 25, 2026.

Return to Wiki