How to Remove JavaScript from PDF Files

As you know, you can place JavaScript scripts inside PDF files that will be executed in the browser. For example, if you download

this PDF

then you will see the result of the script execution:

This is a standard feature of the format. All browser developers know that PDF, like HTML, is active content and may contain executable scripts.

Despite this, some security experts periodically declare the execution of scripts in PDF on the site as an attack option
Stored XSS.

Stored XSS

In this type of attack, the injected script is “persistently stored on the target server, such as in a database, message board, visitor log, comment field, PDF file, etc. The victim retrieves and executes the malicious script when they request the stored information. Stored XSS is also sometimes referred to as Persistent or Type-II XSS.

In fact, the classic Stored XSS attack assumes that somewhere on the server there is an HTML document stored, which can later be called to be displayed in the visitor's (victim's) browser. This is indeed a security risk, because the embedded script can manipulate cookies (by accessing the document.cookie), manipulate web platform storage (IndexedDB, localStorage etc.), carry out other attacks.

But in the case of PDF, it imposes serious restrictions on the functionality of scripts in PDF, so storing such files with active content is not an XSS vulnerability.

In Chromium documentation written:

PDFs have the ability to run JavaScript, typically to help validate fields when filling out a form. Note that the set of bindings provided by PDFs is more limited than what the DOM provides to HTML documents, and PDFs do not gain any additional permissions based on the domain they are loaded from (e.g., no document.cookie).

This means that the scope of script execution from PDF is very limited: no access to cookies or storage, very limited ability to make requests (you can only move the document window), no ability to use the powerful capabilities of the web platform with the DOM object model, objects

document

,

window

,

navigator

While JavaScript capabilities in PDF are extremely limited, PDF viewers should be careful and watchful to ensure they don't accidentally break safe limits.

PDF Scripts in Browser

If the user manually disables script execution in the browser, the browser should extend this restriction to PDF documents as well.

For example, in the Chromium code there is such a fragment:

/**
 * Determine if the content settings allow PDFs to execute javascript.
 */
function configureJavaScriptContentSetting(browserApi: BrowserApi):
    Promise<BrowserApi> {
  return new Promise(resolve => {
    chrome.contentSettings.javascript.get(
        {
          'primaryUrl': browserApi.getStreamInfo().originalUrl,
          'secondaryUrl': window.location.origin,
        },
        (result) => {
          browserApi.getStreamInfo().javascript = result.setting;
          resolve(browserApi);
        });
  });
}

In the Firefox browser on the page

about:config

there is also a switch

javascript.enabled

:

But it does not specifically affect the execution of scripts in PDF, there is a separate switch for them pdfjs.enableScripting:

In turn, Chromium ignores Headlines Content-Security-Policy (CSP) in PDF responses because it renders PDFs using web technologies that may be prohibited by CSP, causing confusion and hindering developers.

So running JavaScript in PDFs is not a vulnerability, but a standard function of the format. At the same time, browsers, other programs, and web applications may display PDF content differently, each with its own rendering engine and its own policy. And somewhere in some engine there may be a vulnerability that allows potentially dangerous code to be executed.

There are special applications like Dangerzone for PDF sanitization, i.e. removing any scripts from it. The program works by double conversion via converting a document to pixels and back. As a side effect, the file size is also reduced.

Alternatively, questionable PDFs can be opened in engines that do not support script execution (for example, xpdf, mupdf or atril) and are therefore invulnerable to such exploits.

It may be added that PDF is not the only format that allows active content, i.e. execution of JavaScript programs embedded in the document. For example, active content is also allowed in the SVG graphic format (see SVG Scripting) for interactive animations and other things, but the browser also greatly limits their functionality. Format PostScript also allows you to embed active content with executable code.

When displaying such files, you should be careful and ensure that the scripts do not go beyond the permitted restrictions. Or delete/block any scripts in advance.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *