Hacking JavaScript with JavaScript

How to use parsers and other tools to analyze JavaScript code

JavaScript appeared over 20 years ago and is still the most widely used language. It is the only programming language that runs on the most popular platform (on the Internet). Native (Visual Studio Code, Discord and Slack) and popular mobile applications (Facebook, Skype, Tesla) are increasingly being developed on it. But do you know what the secret of its popularity is? Bug Bounty and Vulnerability Detection that bring real money.

In any movie about hackers, you will definitely see a scene where someone sits in front of a computer and types cryptic commands on a black terminal screen (unless it is a 3D UNIX interface from Jurassic Park).

Not that far from the truth. Both programmers and hackers need a terminal to solve most tasks, but it will not work for analyzing web applications. The basic tools in the arsenal of the average hacker are either too complex to perform analysis, interception and manipulation of browser applications, or they are not at all suitable for these purposes. JavaScript and HTML are in themselves complex languages ​​that cannot be parsed and processed with simple tools.

Here’s a great example: Regular expressions don’t scale to handle JavaScript code. Last June, this tweet was read by all hackers who use Twitter:

This regular expression tries to extract all assigned variables from a JavaScript source file. In this way, you can get an overview of the application and find weak links. The suggested regexp works great with simple code like this:

But if the code is a little more complicated, nothing will come of it:

In some particularly interesting cases, this regex will skip a variable or get the wrong value:

Another user has significantly improved not only the efficiency of the original regular expression, but also its complexity. Despite all the hard work, the code still misses some variables:

https://twitter.com/d0nutptr/status/1143087965775679490
https://twitter.com/d0nutptr/status/1143087965775679490

In fact, regular expressions are simply unsuitable for parsing JavaScript. And the point.

How to parse JavaScript

Half a dozen ready-to-use JavaScript parsers are used in the Node.js framework. Others are used outside of this environment, but if you plan to manipulate JavaScript, you won’t find a better tool in another language. Using parsers is as easy as using any other library. It’s just hard to figure out what to do with the results. These JavaScript parsers create an Abstract Syntax Tree (AST). It is nothing more than a huge object that reflects the structure of the JavaScript source code. If raw HTTP requests don’t make you nervous, then ASD won’t scare you either.

If you want to know what ASD looks like, use the tool ASTExplorer

ASTExplorer.net
ASTExplorer.net

In the AST Explorer interface, the source JavaScript code will be displayed on the left, and the resulting AST will be displayed on the right. If necessary, you can switch between parsers and compare AST.

Parser selection

ASDs are not alike. There are different versions of the HTTP protocol – 1.0, 1.1, 2.0, and so on. Tools that work great with one version may not be as convenient with other versions. It’s the same with parsers and AST. Once there was a single standard, but it had so many shortcomings that programmers began to create their own solutions, which were significantly different from each other.

I use the Shift toolkit because 1) I have repeatedly fallen into the traps of the ASD and 2) it was written by people who know how not to fall into these traps. AST Shift was the first (and only?) AST based on specifications, that is, the authors thought about how to represent all ECMAScript even before they started solving the parsing problem. The creators of first-generation JavaScript tools tried to tidy up the edge cases encountered in other ASTs. There were a lot of them.

To use shift-parser, you import the library and call the method to parse the JavaScript source line. If you are unfamiliar with the Node environment and do not know how to install dependencies, install node via nvm and read here how to use npm

You don’t need anything other than this code to create an AST for any JavaScript source.

const { parseScript } = require("shift-parser");
const ast = parseScript(javascriptSource);

Moving from AST to JavaScript

In order to get the AST and convert it back to the source code, you need a code generator. This is a little more complicated, since the formatting of the code is entirely up to your preference. If you want code that is easy to read, you need a generator that at least structures the code well. IN shift-codegen built-in two basic formatting tools and one advanced. In the following example, the code gets the AST generated by the parser and generates JavaScript source code using shift-codegen… If formatting does not matter to you, you do not need to import or install FormattedCodeGen

const { default: codegen, FormattedCodeGen } = require('shift-codegen'); 
console.log(codegen(ast, new FormattedCodeGen()));

AST for JavaScript manipulation

ASD can be changed manually like any other JavaScript object:

const { parseScript } = require('shift-parser');
const { default: codegen, FormattedCodeGen } = require('shift-codegen');

const source = `const myVar = "Hello World";`

const ast = parseScript(source);

ast
  .statements[0]
  .declaration
  .declarators[0]
  .init
  .value="Hello Reader";

console.log(codegen(ast, new FormattedCodeGen()));
// > const myVar = "Hello Reader";

As you can see, in-depth analysis of the structure is inconvenient and may contain errors. It can come in handy if you are familiar with the data structure and know that it will not change, but it is not enough for building general purpose tools. To do this, we need a utility that will traverse the tree. Let’s use the tool shift-traverser

Tree traversal

Using the Shift-traverser utility, you can bypass the ASD Shift and manipulate the nodes along the way. The library is a ported version estraversedesigned for another ASD format. Shift-traverser works the same way as estraverse. You specify the AST object as well as the input and output method. Shift-traverser calls these methods on the current node and its parent when the crawler first discovers the node and when it exits.

const { traverse } = require("shift-traverser");

traverse(ast, {
  enter(node, parent) {
  },
  exit(node, parent) {
  }
});

Shift-traverser provides flexible options for traversing the tree and directing requests to individual nodes. This way you can create an algorithm that better adapts to changes in the ASD.

Making it all work

At the beginning of this article, I looked at cases where regular expressions are ineffective for parsing JavaScript code. Now let’s see if you can achieve better results with these tools. First of all, you need to find all the variables declared in the source code. They will be broken into knots VariableDeclaration, in which there can be zero or more nodes of type VariableDeclarator… Take a look at this code: let a = 2, b = 3… Here several variables are declared on one line. VariableDeclarator contains a required (binding) and optional (init) the initial value. The required value can be a simple identifier (let a = 2), object or array (let {c} = d, [a] = b;), so we need to check the required value and its properties or elements.

How do I know this? I cannot enumerate all types of nodes and their contents by heart. I am using AST Explorer to traverse the nodes that I need to analyze for a specific scenario.

This is what the code looks like:

const { traverse } = require("shift-traverser");

module.exports = function(ast) {
  const ids = [];
  traverse(ast, {
    enter(node, parent) {
      if (node.type === "VariableDeclarator") {
        if (node.binding.type === "ObjectBinding") {
          node.binding.properties.forEach(prop => ids.push(prop.binding.name));
          if (node.binding.rest) ids.push(node.binding.rest.name);
        } else if (node.binding.type === "ArrayBinding") {
          node.binding.elements.forEach(el => ids.push(el.name));
          if (node.binding.rest) ids.push(node.binding.rest.name);
        } else {
          ids.push(node.binding.name);
        }
      }
    }
  });
  return ids;
};

I published the package js-identifiers on npmand you can use it as a command line utility. Think of them as strings, but only for JavaScript identifiers. How does it deal with the devious code we’ve seen as an example?

$ js-identifiers scratch.js
findsThisOne
andThisOne
andOneMore
okSoFar
butMissesThisOne
whatAboutThis
or
these
missesTabs
missesNewLines
ಠ_ಠ

He found all the identifiers and did not fall into the trap that we prepared for him in one of the lines. Ideally!

What’s next

We now need approximately 21 seconds and a CLI adapter to complete this task. Of course, this solution is a little more complicated than a 50-byte regular expression, but it is much clearer and more efficient than a tricky one-thousand-character regular expression.

(?(DEFINE)(?'value'(?P>any)?))(?(DEFINE)(?'any'(?P>unbalanced_outer)*(?:(?P>paran)|(?P>curly_braces)|(?P>brackets)|(?P>string)|)+(?P>unbalanced_outer)*))(?(DEFINE)(?'any_inner'(?P>unbalanced_inner)*(?:(?P>paran)|(?P>curly_braces)|(?P>brackets)|(?P>string)|)+(?P>unbalanced_inner)*))(?(DEFINE)(?'paran'(s*(?P>any_inner)+s*)))(?(DEFINE)(?'curly_braces'{s*(?P>any_inner)+s*}))(?(DEFINE)(?'brackets'[s*(?P>any_inner)+s*]))(?(DEFINE)(?'string'((?P>string_double_quote)|(?P>string_single_quote)|(?P>string_tick))))(?(DEFINE)(?'string_double_quote'"(?P>string_context)?"))(?(DEFINE)(?'string_single_quote''(?P>string_context)?'))(?(DEFINE)(?'string_tick'`(?P>string_context)?`))(?(DEFINE)(?'string_context'(?>\[sS]|[^\])*))(?(DEFINE)(?'unbalanced_outer'[^(){}[]"'`,;]))(?(DEFINE)(?'unbalanced_inner'(?:(?P>unbalanced_outer)|[,;])))(var|let|const|G,)s+(?:(?<variable_name>w+)(?:s*=s*(?P>value))?s*);?

If you are used to working with regular expressions, using parsers and tree traversal tools to analyze JavaScript code may seem like a horror movie to you. But keep in mind that regular expressions are not very efficient and impractical to use. Often they do not bring us any closer to our goal. Parsers and AST are powerful tools that simplify analysis and enable complex transformations.

I gave the simplest example, which only displays lines on the command line. Your tool can be even more efficient if you use the JSON format for AST manipulation and analysis. Below you can see an example of use shift-query and shift-codegen for the command line, which allow you to execute queries and extract arbitrary code from JavaScript source. By complementing this solution with several tools, you have a powerful reverse engineering and hacking tool that requires only the command line.


The translation of the material was prepared on the eve of the start of the course “JavaScript Developer. Basic”

We invite everyone to an open lesson “Creating interactive pages, working with animations.” The lesson is devoted to animations on the web. Let’s look at ways to create interactive pages, learn how to animate state transitions of HTML elements, and also create animations in both CSS and JavaScript.

CHECK IN

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *