What's wrong with your code generated by large language models?

Type A: Syntax error.

Syntax errors violate the grammatical rules of the programming language you are using. These errors are detected by the Python interpreter when it attempts to parse the code before executing it. There are three secondary types of syntax errors: incomplete syntax structure , Invalid indent And library import error .

A.1 Incomplete syntax structure. Incomplete syntax structure indicates that the generated code includes an open or partially written syntax element that has not been properly completed. This type of error includes incomplete statements, unmatched parentheses, unclosed quotes, or missing colons.

A.2 Incorrect indentation Python uses indentation to define the scope of loops, conditionals, functions, and other blocks of code. Incorrect indentation disrupts the structure of the syntax, causing syntax errors or inconsistent semantics. As shown in the following example, the indentation level of the second line does not match subsequent lines in the same block of code, violating the Python specification.

A.3 Library import error. Importing a library allows Python programs to use external code without redundant development. Common import errors include missing import statements and incorrect import levels. In the following example, the generated code incorrectly imports all public functions from the library heapq in the function body. However, this operation is only allowed outside the function body.

Type B: Run-time error.

Runtime errors refer to errors that occur when code does not conform to a runtime specification that is encountered at runtime. According to the taxonomy, there are five secondary runtime errors: API Misuse , Definition Missing , Incorrect Boundary Condition Check , Incorrect Argument And Minors .

B.1 Misuse of API. LLMs use APIs to improve code execution efficiency and achieve desired functionality. However, misinterpretation of caller attributes, incorrect API usage, or incorrect identification of an argument type can lead to incorrect API usage, resulting in runtime errors in the generated code. The following example illustrates an attribute error that occurs due to misinterpretation of the variable type tup .

B.2 No definition. Python requires variables and functions to be defined before they are used in a program. However, sometimes it skips defining frequently used variables or simple functions. As shown in the code below, LLM ignores the defining variable MOD which is commonly used in algorithmic problems.

B.3 Incorrect checking of boundary conditions. Incorrect boundary condition checking refers to incorrect implementation of edge or range bounds handling logic in a program. As shown in the code below, the program fails to check the length of a list before performing a remainder operation, resulting in a ZeroDivisionError at processing an empty list.

B.4 Incorrect argument. LLMs sometimes ignore the specified input format in task descriptions, which leads to inconsistencies in the number or type of arguments in the generated code. As shown in the code below, the task includes two inputs: the first specifies the number of items, and the second represents the items to process. However, the generated code only sets one parameter to retrieve the items.

B.5 Minor. Minor runtime errors include timeout errors And exceptions defined by LLM . Programs that exceed this limit due to high algorithm complexity or excessive loop iterations are marked as having a timeout error. LLM-defined exceptions refer to exceptions raised by the LLM for conditional branches in a problem that have no explicitly provided solutions.

Type C: Functional error.

Functional errors refer to errors in a program that cause it to behave incorrectly or not as intended according to its functional requirements ( i.e. (the code executes successfully, but does not pass all unit tests). According to the taxonomy, there are four secondary functional errors: Misunderstanding and logical error , Hallucination , Input/output format error And Minor .

C.1 Misunderstanding and logical error. Code generation tasks involve algorithmic problems where LLMs must extract information from natural languages ​​and apply their knowledge to understand the requirements and establish the correct logic. However, when faced with complex natural language descriptions, models often have difficulty fully understanding concepts, reference relationships, and conditional branches. For example, LLMs may incorrectly interpret integer concatenation as numeric addition. As shown in the code below, LLMs incorrectly interpret integer concatenation as numeric addition. Moreover, even if LLMs fully understand the problem description, translating this knowledge into correct logic remains a challenging task.

C.2 Hallucination. Hallucination refers to cases where LLM generates code that is syntactically plausible but factually incorrect or semantically meaningless. As shown in the code below, the code generated by LLM does not meet the requirements of the task at all.

C.3 Input/output format error. Unlike type B.4 error ( those. invalid argument), an input/output format error refers to the incorrect order of inputs and outputs, as well as the incorrect precision of the output. As shown in the code below, LLM incorrectly converts floating-point output to integer.

C.4 Minor errors. Minor errors in functional errors include incorrect initialization , non-optimal code And endless cycle . Incorrect initialization indicates that the logic of the code is correct, but incorrect initialization values ​​for some variables prevent the code from passing unit tests. As shown in the code below, the variable max_sum is not correctly initialized to 0. Suboptimal code refers to cases where LLM generates code using suboptimal algorithms ( For example, Greedy algorithms) to solve a problem whereby code may pass some unit tests but not all. An infinite loop refers to code that fails to meet the loop's exit conditions given certain inputs, causing the code to execute infinitely.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *