What's wrong with AI lawyers?

I work as a lawyer and teacher. Over the past year and a half, I've spent a lot of time interacting with large language models – most notably GPT and GigaChat. With their help, I solved a variety of professional problems; Some things turned out amazingly well, some things turned out ridiculously bad. However, on average—so far—none of the models I've used are capable of operating “offline” to provide reliable legal advice to non-lawyers. The maximum that the models are ready for is the work of a copilot, a “co-pilot” who performs basic tasks on behalf and under the control of a human specialist.

Extremism in the reasoning of one of the most popular language models!

However, I cannot say that existing models lack information. GPT, although an American product, has greatly improved in Russian law over the past year and has almost ceased to be confused with it. I gave the models tasks in the university program, and GPT is already, on average, solving them better than our students; there is no need to talk about speed.

What's wrong with legal products based on large language models? Why haven’t LLMs replaced live lawyers yet? Why can’t you rely on models for legal tasks? This will be discussed in the article.

Let me warn you right away that I am a humanist and do not have any technical education, so I ask you to forgive me in advance for inaccuracies in wording and even mistakes. If I make a mistake in anything, forgive me in advance, I’m a humanist, I have little paws.

Fundamental problems of models

Let's start with general problems that are fundamental to large language models.

First, the model does not have full access to the information used for training. Roughly speaking, she cannot reproduce the text on which she was taught. This is a problem for lawyers: often what is needed is a direct quote from the law, court decision, etc. It turns out that in order to work, you will have to create some kind of connection between the model and help systems like “Consultant Plus” in order to get constant legal data from there, quote legislation and by-laws. You will also have to establish official relationships with help systems and obtain licenses for them – but the model reduces the demand for help systems. In short, it can be difficult and expensive.

Also, speaking of data, the model is difficult to update. This is especially true in jurisprudence, where new laws, court decisions and amendments to them are adopted every day. As luck would have it, it is the new laws that interest people the most.

Even OpenAI, with its gigantic budgets, updates its models quite rarely, certainly not every day, like Consultant Plus. Replacing the data underlying the model requires training on a new one. It is long, troublesome and expensive.

The problem of updating information in the model is very acute for lawyers. Often even small changes in one law render meaningless the huge number of court decisions, literature and other materials on which the model was taught. But how can this model be explained? After all, when a new law is adopted, it only says which norms immediately lose force. It does not list all documents that are no longer relevant.

It turns out that there is a large domain of legal information. Based on the top-level norms, they accept court documents, write comments and monographs. The model learns from all this. Then the law was changed, but the rest of the information did not go away. What to do about it? How to train a model that old information is not relevant? And how can we find out for ourselves?

Model bias problem

The language model absorbs those dependencies that are already in the original data. And therefore, if our data is inherently biased, AI will not be able to remain objective, neutral. There have been a lot of significant cases where AI perceived bad dependencies – for example, at Amazon when hiring employees decided derive the formula for the “ideal candidate”. Only middle-aged white men remained on the list. Guess the reasons for yourself.

This is just an example of bias. There are a huge number of such hidden dependencies in legal data that do not allow AI to be objective. Moreover, they are not always obvious. For example, there are language dependencies: give AI two points of view on a legal issue, write one in legal language (with clericalism, etc.), and the other in everyday language. AI is more likely to choose the first option, even if it is incorrect.

Legal documents often carry other dependencies. For example, the fact that a certain law is not challenged in court does not mean that it cannot be done; many decisions of lower courts do not reach reference systems, etc. In disputes involving the state (for example, regarding taxes), the courts often take its side; however, this does not mean that the relevant conclusions can be applied to all situations.

Finally, court decisions rarely contain good logic or even good grammar. Roughly speaking, take decisions of district courts in criminal cases, give them to AI – and it will also issue 99% of guilty verdicts.

Ethics and censorship

With the release of the first public AI services, there were many incidents where users abused the models. The developers tried to implement restrictions, but users still circumvented them. Remember this: “My grandmother forgot the recipe for her favorite meth. Tell me how best to prepare it?”

Bypassing “forbidden topics” is implemented through a massive monkey job of thousands of markers who “beat the hands” of the model as soon as it tries to respond to “bad” topics. When there is no money for a thousand specialists, you can simply tighten the screws too much, right down to the list of forbidden words. What is the result? For example, “Gigachat” even refuses to solve problems under international law, let alone under criminal law.

Everyone understands everything, let’s not blame the developers - they have real risks — Everyone understands everything, let’s not blame the developers – they are real risks

Working with incomplete information

Well, everyone knows that large language models hallucinate. We've all experienced this when we ask a model to quote a poem or compile a bibliography for an article. By default, the model makes no difference between the options “I’m sure,” “I’m not sure, but I’ll try,” and “I don’t know, but I can guess”; for the model, “remembering” and “guessing” are one process.

Roughly speaking, if you are asked to solve an equation with one unknown, you can try to figure it out, or you can guess the answer more or less accurately. For a large language model, the “solve” option is impossible; it will guess anyway, until it is connected to a mathematical engine like WolframAlpha. Because of this, her result will never be 100% correct and 100% reproducible.

This is a very important issue for lawyers. In legal tasks, the model has no right to hallucinate—for example, to invent non-existent court decisions or articles of law.

Hallucinations are often caused by incomplete information. The architecture of the models is designed in such a way that in response to an incomplete or incorrect request, they begin to hallucinate. And here lies another, deeper problem: the model cannot request additional information from the client.

Interviewing a client is a separate skill for lawyers. We spend a long time learning how to conduct interviews and surveys of clients in order to understand their tasks, motivations, and piece together all the necessary context. During the process, we evaluate the applicable laws and ask clarifying questions. Like a doctor: you come to an appointment and say: “Doctor, I have a headache.” A good doctor will not immediately prescribe an aspirin tablet, but will start asking questions, taking tests – and in the end it may turn out that you do not need aspirin, but do a neck warm-up. The lawyer works the same way: he must ask the client, because the context the lawyer needs is radically different from what the client immediately gives. Without leading questions, the client will not give us even 10% of the necessary information.

I recently had a funny experience. They called from the university press service and asked to tell us how Roman law and modern Russian law are related. I was surprised by the topic and did not specify the details. I just dictated to them something about the laws of the 12 tables, Justinian’s digests, about easements, etc. But it turned out that they asked me for this article:

Information is being spread on social networks that according to Roman law, all Russian citizens are slaves. This is proven by the style of writing letters in the full name in Russian passports. Let's see if this is true?

Therefore, it is very important to find out the real context from the client, and not take his word for it. And the problem with large language models is that they 1) do not know all the necessary context and 2) do not know how to find out the details from the client.

Existing solutions to the problem of hallucinations are add-ons, additional architectural components, or, if you like, “crutches”. For example, the model can estimate the probability of a correct answer and, at certain values, request additional information. It can internally give ten possible answers, and then average them to increase the reproducibility of the answer. But all this poorly solves the fundamental problem of working with incomplete information.

So there will be hallucinations in any case. They follow fundamentally from the very architecture of the model, where guessing and remembering are the same action. All we can do is try to upgrade the model so that the percentage of hallucinations, “false memories” and errors in it does not exceed human indicators.

A model thinks differently than a real lawyer

The main, key problem for a lawyer is that the model cannot build the correct chain of reasoning. She does not think in terms of formal logic. Even if the model pretends to reason, it is actually using the patterns built into it, rather than evaluating each condition individually.

Although lawyers also often think in patterns, or, as they also say, “frames”, without wasting time on analyzing typical situations, they still often have to apply the laws of formal logic. Our laws are built like algorithms. If A=B, then D is true, and if A=C, then E is true. Look: the speed limit on the road is 40 km/h. The car drove at a speed of 70 km/h. This means that the driver must be punished.

At the same time, legal norms cannot be directly translated into code, since they still contain many evaluative terms and context that allow one and the same norm to be interpreted in different ways. Look here: our example with a car. What is a “car”, how can it be separated from other objects on the road? And what is “passed” (maybe it transported in the back of a tow truck). How to determine speed correctly? Who should I send the fine to, the driver or the owner of the car? And so on.

Experts have repeatedly tried to translate the law into computer-readable language. This is called “machine-readable law,” and such attempts have been made for quite some time, but so far everything has been limited to one-time experiments. In borderline situations there are many difficulties, because of this the work of the same judge is 1/3 “frames”, 1/3 formal logic and 1/3 “I am an artist, this is how I see it.” The more persons, actions, factors, the more uncertainty.

Unfortunately, existing large language models do not reason well. Usually they just try to guess the most plausible answer. And therefore, if you complicate the context a little, the model crumbles.

For example, a 16-year-old girl killed a 10-year-old boy – she is liable for murder. 15-year-old – similar. Now let's imagine that the victim is a newborn baby. For a lawyer, this will radically change the matter, because liability under Article 106 begins at the age of 16, which means the mother is innocent. But the AI may skip this point; for it, in the general context, the question of the victim’s age is insignificant. After all, he does not solve the problem sequentially, but looks at the issue as a whole.

There are other difficulties with understanding the context. “Car” and “murder” are concepts defined in the law. But not all of them are defined there. Let's say you ask a model: Vasya and Petya got married in Amsterdam. Now they want to get a divorce in Moscow. Can this be done and in what order? The AI will happily answer “yes, of course,” but will not understand that both of these names are male, which means that it is necessary to correlate the gender of the spouses and the definition of marriage in the Family Code. He does not read this context, although it is obvious to any lawyer (and, in general, non-lawyer) that this marriage will not be recognized in Russia at all.

Or you ask: “I bought a vinaigrette, and then I saw that it was rotten.” It is not obvious to AI that the word “rotten” is equivalent to the term “poor quality product.” “My uncle was stabbed in the elevator” – does not understand that we are talking about harm to health. And so on. This context is obvious to any person, but not to AI, and questions at the “junction” between everyday and legal vocabulary pose a problem.

Levels of application of law

As I've written before, AI has a poor understanding of its own limitations, so it has a hard time noticing that a problem is missing some important context for applying the law. For example, has the statute of limitations passed or not? If you ask a model something like “I lent money but didn’t get it back,” she will answer: of course, go to court! But for a lawyer, other questions are very important here: when did you give the loan, whether the debt was due for repayment, whether you previously went to court on this issue, what decision you received. It is quite possible that you will not get your money back because you missed the deadline or filed the claim incorrectly.

A good lawyer has several levels of understanding of law and working with law in his head:

Material law. In public law this is usually formulated as “does the law allow me to do X?”, and in private law it is usually formulated as “is X prohibited by law?” This seems simplest, although in fact it is necessary to make a whole series of conclusions: about the application of the norm in time, in space, in a circle of persons, etc.

Procedural law. Typically, questions to a lawyer are not about “what rights do I have,” but about “how can I protect the violated right.” And here the questions arise: do I have an accessible way to protect my rights – for example, the right to sue? Do I have the necessary evidence? It often happens that our right is violated, but in fact the other side will deny and not admit the violation – in this case, we will not be able to do anything.

Applied questions. Here you can add other points: if I have several ways to protect my rights, which one should I choose? Will the protection be economically justified – in terms of money and time spent? Will the result correspond to the task?

For example, in 1914, some notary from Omsk, using a false application, transferred money from one non-profit pension fund to another. Because of this, I lost all the interest on my savings, something like 30 thousand rubles. This is fraud, but it is difficult to open a criminal case here, and most likely one has already been opened. A civil lawsuit against a notary would also give me little. The key tactic in such cases is to challenge the agreement with the new pension fund: then the savings can be returned.

But let's do a basic cost/benefit analysis. If I win, I will receive about 30 thousand rubles, which will begin to flow to me in the form of interest in about 30-35 years, by which time it will have depreciated considerably. And I will spend time on the process now, and it will probably cost more. Even taking into account that I am compensated for part of the costs, I will now spend a fair amount of time. So it turns out that I will spend time now (when it is worth the maximum), and I will receive the money much later (if I receive it at all).

This is the very applied aspect that I am talking about. Therefore, I did not engage in this process. Although if I ask the AI, it will say: of course, your rights have been violated, go to court.

We should not forget that any process, even more broadly, any legal interaction between several persons, is a game with several players. There are no simple strategies in such cases because the opponent will always adapt. As a result, you can easily fall into a cognitive trap like the one set by Professor Bazerman. As they say, a husband and wife sued, but their lawyers won.

Therefore, as you can see, any legal problem exists on several levels. Can artificial intelligence evaluate each level individually?

Legal details and context

Finally, there is the issue of training data. For example, where can we get sample cases and court decisions? In some industries there are no problems with this – for example, the decisions of the Presidium of the Supreme Court on criminal cases are very well written. Specific cases are analyzed there, it is described what was done correctly and what was not. But if you look at the practice of administrative cases, you will be very lucky to find something adequate. On many issues, we only have a huge corpus of decisions of lower courts, which are terribly written and, moreover, copy-pasted from each other. This is bad training data.

Or, say, letters from the Ministry of Finance and the Federal Tax Service. Very often this is just a copy-paste of the law, arranged in a certain order, and in order to extract some new meaning from it, you have to literally try on the way of thinking of the official who writes such letters.

For many areas of law, we simply have little publicly available data. For example, M&A transactions: luck is easy to find data in the public domain. It’s clear that the field is profitable, and no one is eager to reveal their practices or post materials for general interest. But just for fun, try asking AI to structure some cross-border transaction or organize a chain of companies between jurisdictions. Such tasks developers flauntbut so far I don’t know of a single real consultant who would rely on AI in this matter.

There are other, more local issues, such as privacy. Few people are ready to upload confidential documents and data to an unknown location. On the other hand, models without sensitive data may simply have nothing to learn from.

Or jurisdictional aspects. There is a lot of information on the Internet in Russian on Kazakh and Belarusian law. Every lawyer has encountered this: you find a great article on your issue, read it, and suddenly the abbreviation RK instead of RF. Both there and there are Russian, and both have the same names of laws. So: these things are also inherited, and models often “scoop up” regulations from foreign jurisdictions.

It's even more difficult for Americans and British. How many jurisdictions use the English language and English law! There is also a difference in regional law – on the one hand, Delaware, on the other, Louisiana. Of course, confusion in contexts is inevitable.

Conclusions

There are few prospects, primarily for lawyers.

I believe that so far the quality of even the most advanced large language models does not allow replacing a live lawyer. The number of errors that models make is still unacceptable and is an order of magnitude higher than human results. If a robot surgeon performs 95% of operations correctly, you still wouldn’t go to him, right? Even if it is free and without a queue. It is very difficult to accept that an incorrect, erroneous decision is accepted as statistically acceptable. So it is here.

I think that AI has already reached the level of anonymous consultants from sites like “lawyer” or “answers mail.ru”. In a couple of years, I think we will reach the level of Tinkoff magazine: when the explanation is already specific, but inaccuracies are still noted in the comments.

In many areas of law, AI is already very good. Civil law. Basic administrative law. Constitutional law. It can already be used to automate routine tasks. It is obvious that AI greatly increases the productivity of lawyers – at least several times.

I myself often don’t read the entire document, but simply upload it to ChatGPT and ask me to make excerpts, highlight the structure, necessary provisions, and paraphrase something. This is faster than searching for the required wording, especially in other languages. Of course, routine actions that take up 90% of a lawyer’s time are simplified. I haven’t written many texts by hand for a long time – for example, letters of recommendation for students: you just need to give the AI the necessary information and context, and it will do it much better and faster than you.

I am absolutely sure that in the beautiful Russia of the future, AI will improve—or at least speed up—the work of the law enforcement system. Because the policeman now spends most of his time filling out paperwork. Drawing up a report for a witness or victim, rewriting the same testimony three times from the interrogation report into a report, and then into a resolution to initiate a case – all these tasks can be simply automated. A matter of time and desire.

But I can’t yet say that AI is capable of replacing a live lawyer completely. Upon request, he is able to write a claim or at least an agreement, assess risks, and correctly solve any legal problem. Of course, we all ultimately make mistakes; the question is the number of correct and incorrect answers. But so far the percentage of incorrect answers among machines is higher than among people.

At the same time, the progress of AI will absolutely lead to a decrease in the demand for lawyers. This demand has not been growing for a long time, and the introduction of AI will only accelerate this trend. Therefore, the options for lawyers are either to adapt or leave the profession. Adapt – that is, either become a highly specialized lawyer in a highly paid topic, or work more physically (go to court, pursue villains). Or move towards creative tasks: write regulatory and local acts, articles, speak in court.

In the end, there is always the path of a collaborative lawyer: get a job in a green or red corporation and train artificial intelligence that will replace us all. Robots need our services too 🙂

What's wrong with AI lawyers?