Cloudflare develops Firewall for AI

Cloudflare announced the development of a firewall to protect large language models.

LLM and other artificial intelligence models are growing, and companies are increasingly concerned about the security of their own neural networks. Using LLM as part of Internet-connected applications creates new vulnerabilities that can be exploited by attackers.

Some vulnerabilities that affect traditional web and API applications are also inherent in language models. This includes injections and data theft. However, there is a new set of threats that are now relevant due to the nature of LLM programs. For example, researchers recently discovered a vulnerability in an artificial intelligence collaboration platform that allows model capture and unauthorized actions.

Firewall for AI (Firewall for AI) is an advanced web application firewall (WAF), specifically designed for applications that use large language models. It includes a set of tools that can be deployed to monitor and detect vulnerabilities, as well as other products that are already part of the WAF: rate limiting, sensitive data detection. A new level of protection will also be added, which is still under development. This new check analyzes the request submitted by the end user to identify attempts to extract data through language models and other attempts at abuse.

The AI ​​firewall leverages the Cloudflare network and works as close to the user as possible to detect attacks early and protect both the end user and the model from abuse and attacks.

Why LLM applications are different from traditional ones

Before we talk about how an AI firewall works and list its full range of features, let's take a look at what makes LLMs unique and the potential attacks against AI applications. We took as reference material OWASP Top 10 for LLM Programs.

One of the differences between LLM and traditional applications is the way users interact with the product. Traditional applications are deterministic in nature. For example, a banking application is defined by a set of operations (check balance, make a transfer, etc.). Security of business transactions (and data) can be achieved by controlling the fine-grained set of transactions accepted by these endpoints: GET/balance or POST/transfer.

LLM operations are non-deterministic by design. LLM interaction is based on natural language, which makes identifying problematic requests more difficult than matching attack signatures. In addition, if the response is not cached, LLM will usually produce different responses each time, even if the input was the same. This makes it much more difficult to limit the way a user interacts with an application, and also poses a threat to the user as they may fall victim to misinformation and lose trust in the model.

Another difference is that in traditional applications the code is well separated from the database. Certain operations are the only way to interact with the underlying data (for example, a request to show payment transaction history). This allows security professionals to focus on adding checks and fences to the control plane and thus indirectly protect the database.

In LLM, the training data becomes part of the model itself during the training process, making it extremely difficult to control how that data is transferred as a result of a user query. Some architectural solutions are currently being explored, such as splitting the LLM into different layers and separating data. However, an ideal solution has not yet been found.

From a security perspective, these differences allow attackers to create new attack vectors that can target LLMs and remain undetected by existing security tools designed for traditional web applications.

OWASP LLM vulnerabilities

Fund OWASP published a list of the 10 main classes of vulnerabilities for LLM, providing a basis for organizing the protection of language models. Some threats are the same as those included in top 10 OWASP for web applicationswhile others are specific to language models.

As with web applications, some of these vulnerabilities are best addressed in the design, development, and training of LLM applications. For example, training data poisoning can be done by introducing vulnerabilities into the training data set used to train new models. The sent information is then provided to the user when the model runs. Supply Chain Vulnerabilities And unsafe plugin design are vulnerabilities introduced in components added to the model, such as third-party software packages. Finally, managing authorization and permissions is critical when working with Excessive Agencywhere any models can perform unauthorized actions within the wider application or infrastructure.

And vice versa, fast implementation, model service failure And disclosure of confidential information can be mitigated by adopting a proxy security solution such as Cloudflare Firewall for AI.

LLM Deployment

LLM security risks also depend on the deployment model. There are currently three main approaches to deployment: internal, public and product LLM. In all three scenarios, it is necessary to protect the models from abuse, protect any sensitive data stored in the model, and protect the end user from misinformation or exposure to inappropriate content.

Internal LLM: companies develop LLMs to support employees in their daily tasks. They are considered corporate assets and should not be accessible to non-employees. An example would be an AI assistant trained on sales and customer interaction data used to create customized proposals, or an LLM trained on an internal knowledge base that engineers can access with queries.

Public LLMs: These are LLMs that can be accessed by a user outside the corporation. Often these solutions have free versions that anyone can use, and they are often trained on general or publicly available knowledge. Examples: GPT from OpenAI or Claude by Anthropic.

Product LLM: LLM can be part of a product or service offered to clients. Typically these are stand-alone, specialized solutions that can be used as a tool for interacting with company resources. For example, customer support chatbots or Cloudflare AI Assistant.

From a risk perspective, the difference between product and public LLMs is who is responsible for successful attacks. Public LLMs pose a threat to data because the data that goes into the model can be accessed by almost anyone. This is one reason why many corporations advise their employees not to use confidential information in tips about public services. Product LLMs threaten companies and their intellectual property if models had access to proprietary information during training (either intentionally or accidentally).

Firewall for AI

Cloudflare Firewall for AI will be deployed like a traditional WAF, where every API request with an LLM prompt is scanned for patterns and signatures of possible attacks.

AI firewall can be deployed in front of models hosted on the platform AI Cloudflare Workers, or in any third party infrastructure. It can also be used together with Cloudflare AI Gateway . Customers will be able to control and configure the AI ​​firewall using the WAF control plane.

Firewall for AI works like a traditional web application firewall.  It is deployed in front of the LLM application and scans every request for attack signatures

Firewall for AI works like a traditional web application firewall. It is deployed in front of the LLM application and scans every request for attack signatures

Preventing volumetric attacks

One of the threats listed by OWASP is denial of service model. As with traditional applications, DoS attack is carried out using a large number of resources, which leads to a decrease in the quality of service or a potential increase in the cost of operating the model. Given the amount of resources required to run LLM and the unpredictability of user input, this type of attack can be dangerous.

The risk can be mitigated by adopting rate limiting policies that control the rate of requests from individual sessions, thereby limiting the context window. By proxying their model through Cloudflare, users receive DDoS protection “from the box”. You can also use speed limit and extended speed limit to control the rate of requests allowed for a particular model by setting the maximum rate of requests made by an individual IP address or API key during a session.

Sensitive data detection feature

There are two options for using sensitive data, depending on whether the user owns the model and data or wants to prevent users from submitting data to public LLMs.

According to definition OWASP, disclosure of confidential information occurs when LLM inadvertently discloses sensitive data in responses, resulting in unauthorized data access, privacy and security breaches. One way to prevent this is to add strict tooltip checks. Another approach is to determine when personally identifiable information (PII) leaves the model. This is true, for example, when the model was trained using a company's knowledge base, which may include sensitive information (such as Social Security numbers), proprietary code, or algorithms.

Customers using Cloudflare WAF-based LLM models can use the WAF's managed set of sensitive data discovery rules to identify the specific PD that is returned by the model in the response. Clients can view sensitive data matches in WAF security events. Today, sensitive data detection is offered as a set of managed rules designed to scan financial information (such as bank card numbers) as well as sensitive information (API keys). As part of the roadmap, Cloudflare plans to give customers the ability to create their own fingerprints.

Another use case is to prevent users from sharing PD or other sensitive information with external LLM providers such as OpenAI or Anthropic. To protect against this scenario, Cloudflare plans to extend its sensitive data detection rules to scan the request and integrate its output with the AI ​​Gateway, which detects whether certain sensitive data was included in the request along with the request history.

Cloudflare will start by using existing privacy detection rules and plans to allow customers to create their own signatures. Additionally, tangling is another feature that many customers talk about. When advanced PDN detection rules become available, they will allow clients to hide certain sensitive data on the command line before it reaches the model. This option is under development.

Preventing Model Abuse

Model abuse is a broader category of harmful behavior. This includes approaches such as “Prompt Injection” or sending requests that cause glitches or result in inaccurate, offensive, inappropriate, or simply inappropriate responses.

Prompt Injection is an attempt to manipulate a language model using specially crafted input, causing unexpected LLM responses. The results of injections can vary from extracting sensitive information to influencing decision making by simulating normal interaction with the model. A classic example of such an injection is the manipulation of resumes in order to influence results. resume checking tools.

A common problem experienced by AI Gateway clients is the AI ​​generating x offensive or otherwise unpleasant responses. The risks of not having control over the model's output include reputational damage and harm to the end user by providing an unreliable answer.

These types of abuses can be addressed by adding an additional layer of protection in front of the model. This layer can be trained to block injection attempts or block requests that fall into inappropriate categories.

Checking Hints and Answers

The AI ​​firewall will run a series of detections designed to identify shortcut attempts and other abuses, such as ensuring the topic stays within the boundaries defined by the model owner. Like other existing WAF features, the firewall automatically looks for hints embedded in HTTP requests or allows clients to create rules based on where in the JSON body of the request the hint can be found.

Once enabled, the firewall will analyze all input data and assign a score based on its potential harmfulness. It will also tag suggestions based on predefined categories. The score ranges from 1 to 99, indicating the probability of injection, with 1 being the highest probability.

Customers will be able to create WAF rules to block or process requests with a specific score in one or both of these dimensions. You can combine this score with other existing signals (such as a bot score or an attack score) to determine whether the request should reach the model or should be blocked. For example, it can be combined with a bot assessment to determine whether the request was malicious and generated by an automated source.

Detecting operational exploits and operational abuses is part of the mission of Firewall for AI.  Early product design

Detecting operational exploits and operational abuses is part of the mission of Firewall for AI. Early product design

In addition to scoring, each tip will be assigned tags that can be used when creating rules. They will prevent tooltips with a certain tag from entering the model. For example, customers will be able to create rules to block certain topics or words that are categorized as offensive or related to religion, sexual content or politics.

How to use a firewall for AI? Who can use it

Enterprise customers using the Application Security Advanced offering can immediately begin using advanced rate limiting and sensitive data detection (during the response phase). Both products can be found in the WAF section of the Cloudflare dashboard. Firewall for AI's Quick Check feature is currently under development and will be released in beta in the coming months to all Workers AI users.

Thank you for your attention.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *