Generative AI Increases Developer Productivity by 26.08%

Research work claimsthat using AI tools helps developers complete 26.08% more tasks.

An experiment was conducted on thousands of developers from Microsoft, Accenture, and an anonymous company: about half of the employees were given access to Copilot, while the others were not allowed to use the tool. A comparison of the data between the two groups shows the positive effect of writing code with the help of artificial intelligence.

The boom of generative artificial intelligence is set to put millions of people out of work. Large language models (LLMs), systems for generating images, voice, music and video, create no worse than people. Such catastrophic consequences for the labor market were predicted even before the release of ChatGPT and Claude.

For example, a report by the McKinsey Global Institute, the research department of the renowned international consulting company McKinsey, in June 2018 claimedthat by 2030, 400 million people (15% of the total workforce) will lose their jobs. A similar forecast is that 300 million jobs will be automated — gave Goldman Sachs bank already during the new AI “summer” in April 2023.

Less alarmist research on AI (e.g. report from the UN International Labour Organization) talk about helping human workers, not replacing them. The role of this help is assessed differently. One article even proclaims that ChatGPT will significantly help the fourth industrial revolution (doi:10.1016/j.ject.2023.08.001). Some argue that the macroeconomic impact of AI will be negligible, no more than 0.66% of productivity growth over the next decade (doi:10.3386/w32487).

In June 2023, McKinsey released another document that focuses on the economic potential of AI. The report notes that the overall contribution of AI to global GDP growth will be between 15% and 40%. McKinsey also identified four main areas where ¾ of the impact of artificial intelligence will come from: customer interaction, marketing, R&D, and software writing.

Indeed, many startups are trying to create AI for writing code. On the AI ​​Startups website collected list of 30 such organizations. Not all of them are just another code completion system; some propose replacing human engineers entirely with autonomous agents. Among them is Devin from Cognition, presented in March 2024. The success of the product can be judged by the fact that the startup creator Devin himself didn't close hiring software engineers.

However, autocompletion and code writing systems have gained popularity among programmers. Some even claim their absolute penetration. In June 2023, GitHub survey data they saidthat 92% of US developers use AI tools to write code. In similar reports from GitHub in August 2024, the share grew up up to 97%.

Other estimates of the popularity of such tools look more modest. In July 2024 alone, several reports on this topic were released. A survey by outsourcer BairesDev estimates the popularity of generative AI for writing code among developers in 72%Stack Overflow — 62%Capgemini is just 46%.

The exact numbers vary, but all such reports agree that AI is helping developers significantly. published on September 5 The paper was written by Zheyuan (also goes by Kevin) Cui of Princeton University, Mert Demirer and Tobias Saltz of MIT, Sonja Jaffe of Microsoft Research, Leon Musolf of the University of Pennsylvania, and Sida Peng of Microsoft.

There are many AI-based code completion tools: GitHub Copilot, Amazon CodeWhisperer, Replit Ghostwriter, and others. The study focused only on the first of them. Randomly selected programmers were given access to Copilot, while others (the control group) worked without it. The experiments were conducted based on an analysis of developer data from three companies:

  • Microsoft. The experiment lasted for 7 months and involved 1,746 developers from the company's American offices. Of these, 50.4% were randomly selected to be in the experimental group.

    One day, members of the experimental group received an email about their opportunity to access the new tool. The email described the productivity benefits of Copilot and its potential impact on coding tasks. However, neither the email nor any other job descriptions required them to use the new tool or explained how to use Copilot.

    The experiment ran from the first week of September 2022 to May 3, 2023. As the article explains, the end date was driven by increased awareness of Copilot among the control group and a desire to use it in their work.

    Letters received by the experimental and control groups

    Letters received by the experimental and control groups

  • Accenture. Here the experiment lasted 4 months and involved developers from, as the article vaguely puts it, Southeast Asia. Judging by company locations from its websitethese could be offices in Malaysia, Singapore, Thailand, Indonesia and/or the Philippines.

    61.3% of the 320 developers were given access to Copilot. Similarly, the experimental group was informed about the benefits of the tool, but not in the form of a letter, but in a full-fledged training. Another difference is that the managers of the experimental group members were asked to encourage the use of Copilot.

    The experimental group was given Copilot in the first week of July 2023. In December 2023, the control group was also allowed to use the tool, but Copilot was less popular among its members.

  • A third company, whose name is not given, is only mentioned as an electronics manufacturer that is on the list Fortune 100. In this case, Copilot was issued to all 3,054 developers, but not at the same time — some teams received the tool six weeks earlier than others. The dates for issuing invites were distributed randomly between September and October 2023.

Software development productivity is difficult to measure. What helped in the assessment was that the work process is structured and broken down into small tasks in version control systems. In simple terms, the researchers counted pull requests in GitHub, since all three companies used this service. They also counted the number of commits, builds, and the proportion of successful builds.

In addition, the statistics included how much code Copilot submitted and how much the developer accepted. In the case of Microsoft, the researchers were given data on the date of hiring of employees and their level within the company, which allowed them to assess professional skills.

The obtained results (except for the success of the builds) have high values ​​of standard deviation. It is noted that high variability limits the quality of regression analysis of experimental data.

Control group

Experimental group

Average

Standard deviation

Average

Standard deviation

Difference

p-value

Microsoft

Pull requests

0.86

1.49

0.87

1.50

0.01

0.88

Commits

9.43

14.86

9.36

14.80

-0.07

0.94

Builds

7.76

12.99

7.67

12.73

-0.09

0.91

Share of successful builds

0.72

0.30

0.75

0.29

0.02

0.33

Newly Hired

0.48

0.50

0.52

0.50

0.04

0.23

Junes

0.55

0.50

0.61

0.49

0.06

0.03**

Accenture

Pull requests

0.13

0.47

0.14

0.47

0.00

0.85

Commits

2.56

6.00

3.64

7.25

1.08

0.01**

Builds

0.96

2.54

1.10

2.68

0.14

0.38

Share of successful builds

0.51

0.37

0.54

0.38

0.03

0.40

Anonymous company

Pull requests

0.73

1.23

0.73

1.19

-0.00

0.99

The reluctance of the subjects themselves to resort to the new tool also leaves its mark. Although Copilot is integrated into development environments and does not require any special (financial or labor) investments, its popularity is far from 100%.

  • In the first two weeks of the experiment, only 8.5% of the members of the experimental group Microsoft started using Copilot at work. It is likely that the letter simply got lost in the flow of work correspondence. On February 15 and 28, 2023, two additional reminder emails were sent out internally at Microsoft. In the following two weeks, Copilot usage in the experimental group increased to 42.5%.

    Also, 0.5% of the control group used Copilot, ignoring the experiment's restrictions. When the control group was finally allowed to use the tool, many of them quickly connected to it.

    By January 2024, Copilot usage in the control group was lower than in the experimental group. It is likely that there is a typo in the article, since the percentages given for this are 75.6% and 64.0%, respectively.

  • IN Accenture Copilot's popularity obediently grew to 60% in the first 1–2 months, but then hardly changed. At the end of the experiment in December 2023, the control group was less interested in the tool than the experimental group. In April 2024, the share of Copilot users was 69.4% in the experimental group and 24.4% in the control group.

  • A similar situation developed in the anonymous company: immediately after the rollout of Copilot, the share of its users reached a plateau and subsequently changed little.

The data was analyzed at the approximation of one person-week. To evaluate the effectiveness of Copilot, the study uses the formula

y_{it} = \beta D_{it} + \mu_i + \gamma_t + \epsilon_{it}.

The significance was estimated using a two-step least squares method. Here, β is the coefficient of interest, Dit is an adoption dummy that is activated after a developer first uses Copilot, µi is a developer fixed effect, and γt is a week fixed effect. Working with data where developers gradually gained access to the tool was even more difficult to estimate.

The β estimates were tabulated and compared with the control group mean. Standard errors were clustered at the treatment group assignment level, which varied across companies.

Indicator

Microsoft

Accenture

Anonymous company

Combined data

Pull requests

27.38** (12.88)

17.94 (18.72)

54.03(42.63)

26.08** (10.3)

Commits

18.32 (11.25)

-4.48 (21.88)

13.55 (10.0)

Builds

23.19 (14.20)

92.40*** (26.78)

38.38*** (12.55)

Share of successful builds

-1.34 (4.23)

-17.40** (7.12)

-5.53 (3.64)

Number of developers

1,521

316

3,030

4,867

Number of groups

690

316

432

1,438

According to the data, Copilot increased the number of pull requests, commits, and builds at Microsoft. However, the study notes that only the number of pull requests is statistically significant. This is likely why the average of 26.08% in the last column is cited in the Abstract section of the research paper as a productivity boost based on an analysis of thousands of developers from three companies.

The text of the article also claims that Copilot not only helps you do more — the quality doesn’t go down. As the study says, the success rate of builds hasn’t decreased. However, the table still shows a negative growth of 5.53%.

Among other observations:

  • Microsoft employees were divided into new (less than the median hiring time) and long-timers (longer). It turned out that newcomers use Copilot more often (84.3% vs. 74.8%). Moreover, new employees are also more likely than Microsoft veterans to continue using the tool.

    As the article speculates, this is because it is often younger employees who benefit most from the new tool.

  • A similar effect is observed when Microsoft developers are divided by level. Juniors use Copilot more (82.1% versus 76.8%) than senior developers. In this case, there is no inequality in the frequency of abandoning Copilot.

  • It is observed that Copilot's contribution is higher for new Microsoft employees and juniors. While for long-time employees the growth rate was from 8 to 13%, newcomers improved by 27-39%. The difference by level is not so noticeable: for juniors it is from 21 to 40%, for seniors – from 7 to 16%.

It's worth noting that GitHub Copilot translated on the GPT-4 BJM only on November 30, 2023, already after the main period of the experiment. Most of the data obtained relates to the period when the code was written using the significantly more primitive GPT-3.5 model.

A preprint of the scientific article “The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers” has been published in the preprint repository Social Science Research Network (doi:10.2139/ssrn.4945566).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *