Semgrep - More Than Just a Glorified Grep

Written by Nikos Vaggalis

Tuesday, 26 May 2020

Introducing a tool to search through code for flaws where plain regexes fall flat and using Static Application Security Testing would be overkill.

Semgrep proclaims itself as:

"a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search".

It isn't just a glorified grep, though. It occupies a space somewhere in between grep and a SAST tool - more expressive than grep, but not as hard to tweak and learn as a SAST.

An example that showcases its ability that goes beyond the boundaries of simple grepping for a pattern, is when looking for a file handle that is opened but not closed. That is, I want to know that after a $FILE = open(...) somewhere in the flow of the code there's also a $FILE.close().

In this case grep would fail because regular expressions will only take you so far and it has to work across multiple lines, but also because grep can't work in the broader context and monitor the flow of the code.

The simplest, but still extremely powerful, pattern that Semgrep offers in this case is the Ellipsis operator which with a rule written in YAML like the following, can satisfy the condition of looking for the missing $FILE.close() call:

rules:
  - id: open-never-closed
    patterns:
      - pattern: $FILE = open(...)
      - pattern-not-inside: |
          $FILE = open(...)
          ...
          $FILE.close()
    message: "file object opened without 
                         corresponding close"
    languages: [python]
    severity: ERROR

This rule looks for files that are opened but never closed. It accomplishes this by looking for the open(...) pattern and not a following close() pattern.

The $FILE metavariable ensures that the same variable name is used in the open and close calls. The ellipsis operator allows for any arguments to be passed to open and any sequence of code statements in-between the open and close calls.

We don't care how open is called or what happens up to a close call, we just need to make sure close is called.

Another example provided is looking for lines with a call to setcookie() but catering for all instances of the function since it can accept a variable number of arguments.

Semgrep's rules can be as simple as $X == $X which looks for false equality such as if (node.id == node.id) where the coder actually meant if node.id == 'node.id', but can also also more complex, like the FILE.open example already gone through.

What's even better is that there's a whole rule registry where you can find all sort of rules to check your code against in the supported languages of Python, Javascript, Go and Java.

Example rules on Java:
java.jax-rs.security.jax-rs-path-traversal.jax-rs-path-traversal

Message
Detected a potential path traversal. A malicious actor could control the location of this file, to include going backwards in the directory with '../'. To address this, ensure that user-controlled variables in file paths are sanitized. You may also consider using a utility method such as: org.apache.commons.io.FilenameUtils.getName(...)
to only retrieve the file name from the path.

Rule Pattern:

- pattern-either:
  - pattern: |
      $RETURNTYPE $FUNC (..., @PathParam(...) $TYPE $VAR, ...) {
        ...
        new File(..., $VAR, ...);
        ...
      }
  - pattern: |-
      $RETURNTYPE $FUNC (..., @javax.ws.rs.PathParam(...) $TYPE $VAR, ...) {
        ...
        new File(..., $VAR, ...);
        ...
      }

on Javascript:
contrib.nodejsscan.eval_yaml_deserialize.yaml_deserialize

Message
User controlled data in 'yaml.load()' function can result in Remote Code Injection.

Rule Pattern

- pattern-inside: |
    var $X = require('js-yaml');
    ...
- pattern: |
    $X.load(...)

and so on.

The great thing with such as collaborative registry is that you can leverage the expertise of people writing rules in their domain of knowledge, submitting them to registry for others to import and reuse. Of course, the rules are customizable too.

Apart from writing rules for finding security bugs, Semgrep can also be used to enforce code specific patterns, best practices and scan PRs for vulnerabilities.

At the HELLA Security conference, Drew Dennison of r2c, the maintainer of the tool,demonstrated the tool's power by running it live against Apache's Libcloud GitHub repo against the pattern $X == $X which found an actual bug in the codebase! He then had to open a PR to notify the maintainers of the repo.

You can do it yourself too and scan your GitHub repos through Semgrep's Live Editor at https://semgrep.live/ and its Scan option.In there you can also play with examples and rules to get the feeling of it.

I actually run a scan of my own Android repo against Java rules java.lang.correctness and java.lang.security. After a few nail bitting moments moments the results came out clean.What a relief!

General language rules aside there's also domain specific rules you can use such as on java.spring.This means that when an expert in that domain writes a rule and submits it to the registry you can take advantage of it right away.

The tool is distributed as binaries for macOS and as a script for Ubuntu.For all others there's also a Docker image.

So open source, better than grep and simpler than a SAST and with a much nicer price tag - free!

More Information

EU Bug Bounty - Software Security as a Civil Right
Exposing The Most Frequent Mistakes In Programming

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Blender Free Game Tops Steam
18/07/2025

Blender Studio has released a free-to-play game on Steam that is designed not just to be fun to play, but as an example of what you can create in the games arena using just open source software.

+ Full Story

Breaking The Cipher Of Mary Queen Of Scots
29/06/2025

Researchers who break ciphers for fun have been talking about how they broke the coded letters of Mary Queen of Scots using a combination of computer algorithms, linguistic analysis and manual co [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 26 May 2020 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments