This report examines the impact of ChatGPT and other AI tools in providing assistance to software programmers with autocoding, such as code generation and debugging.
The spotlight is on how generative AI is emerging from large language models with the release of ChatGPT by OpenAI, which has taken the world by storm. While the technology is in a line of incremental improvements, ChatGPT has gained in usability improvements, which, coupled with its current free access, has drawn a lot of people to try it out. This report examines the impact of ChatGPT and other AI tools in aiding software programmers with autocoding, such as code generation and debugging.
STOP PRESS: As this report went to publication, OpenAI released GPT-4, its latest model, which also performs programming tasks. While an improvement over ChatGPT, it is, to quote OpenAI, “less capable than humans in many real-world scenarios.”
When OpenAI released ChatGPT in late 2022, it also incorporated assistance with coding. More recent applications have found ChatGPT useful in debugging code. Anecdotal views among developers writing about their experiences in developer forums say that ChatGPT and Copilot are good enough to be of help, but they are not error free, and developers need to check all code that is produced.
We looked at who can benefit from this technology and examined use cases in turn. The following assessment is based on the current state-of-the-art capabilities. It is important to note two points:
- This technology is continually evolving and will improve. With improvements, the use cases will broaden.
- ChatGPT, as with other state-of-the-art intelligent virtual assistants, remembers a thread of conversation; therefore, a good way to use it (and being done by some developers) is to continually refine an output from ChatGPT if the first iteration is not what is required. ChatGPT is able to use this feedback to improve its future answers.
- Pros: This group is likely to gain the most from autocoding, being well placed to judge the quality of the output, and developers are advised to always check the generated output. Tools like Copilot and ChatGPT (and others, see below) can help speed up development by generating both small and large blocks of code and also for debugging code. Autocoding can also assist a developer who is learning a new programming language, where they know what they need but could be helped with details. Interrogating the tool to refine the output is commonly performed and is a recommended tactic. The media-reported fears that autocoding will put professional developers out of work is an overreaction: autocoding assists developers and will not replace them.
- Cons: The introduction of autocoding could lead to lazy programming if the output is not checked. These tools do not perform custom design work and will always need a human to direct the requirements. Autocoders are also limited in scope to common application scenarios and will be useless for any new technology because AI models would not have received enough training yet. Finally, owing to the output from autocoding not being certified or approved for any level of accuracy, its use for application security requirements should be restricted to security specialists (should they choose to use it). A research paper (by Hammond Pearce, et al., see Further reading) assessed the security of Copilot code contributions and found that out of 1,689 program outputs relevant to high-risk cybersecurity weaknesses, approximately 40% were found to be vulnerable. Another research paper by Owura Asare, et al. studied how Copilot compares with human developers in introducing the same software vulnerabilities. The results showed that “in a substantial proportion of instances Copilot did not generate code with the same vulnerabilities that human developers had introduced previously, we conclude that Copilot is not as bad as human developers at introducing vulnerabilities in code.”
Non-professional developers and business power users
This group includes individuals who might have some programming experience, but coding is not their main job function. This is the user category that no-code/low-code (NCLC) tools address.
- No code: No-code tools need to be fully automated, so how the code is generated is of crucial importance. ChatGPT uses natural language, which leaves room for ambiguities. How you phrase a question will in itself drive a different answer. A no-code tool typically uses a visual programming interface to define the requirements reducing the scope for mistakes. AI may be used under the hood to help generate the code. It makes no sense for a no-code user to interact directly with an autocoder. Of course, as AI models mature and improve, this could change.
- Low code: The absence of a professional developer means that adequate checking of the code may not be possible. If the low-code application is limited in scope, such as creating a business form, then using autocoding as a frontend can be useful. On the other hand, it would be inadvisable for a low-code application fronted by an autocoder to have change privileges and interact with the corporate database unless the tool has in-built governance to prevent mishaps, for example, work on a replicated database. However, the use of AI-based autocoding under the hood is already in the market (see re Microsoft below).
Citizen data scientist
In this scenario, professional developers or business users without data science expertise may use autocoders to engage in various machine learning (ML) tasks such as data preparation, feature engineering, model selection, and model training, among others. Given the expertise required to build unbiased, fair, and explainable AI applications (with potential EU laws in the wings to enforce this), it would be advisable to work with an ML solution that is already designed for this user category and, in turn, these tools may make use of autocoding under the hood.
- Autocoding as a technology is here to stay and will become a standard developer tool, helping professional developers accelerate their work.
- The technology behind ChatGPT is good enough to be useful as a coding assistant, but the output should be checked by professional developers.
- It is most likely that autocoding technology will evolve and improve, so the advice on how to use this technology will change.
- CIOs and CTOs must address the use of autocoding within their organization to ensure application security by controlling who can make use of autocode, as well as reap the benefits this technology can provide its developers.
- Generative AI-based coding is only useful for mining existing coding knowledge—to ask it to create working code for a novel algorithm for which no examples exist may create nonsensical code or “hallucinations.”
- C-level executives should begin to explore how to best exploit autocoding for their organizations and adopt a strategy for its safe use while also leveraging it for the benefit of their organization.
- Development team leaders and managers should ensure that autocoding is being used safely by their teams and that all generated code receives a review. There should be transparency in where and how this technology is being adopted.
- Developers should try out autocoding to see how it can accelerate their work. There is anecdotal evidence that this technology can act as a useful assistant in development work.
ChatGPT is only “good enough” today
The engineers at OpenAI are quoted as being surprised at the response to ChatGPT. AI-based virtual assistants are a thriving market but only available at a premium price. ChatGPT’s key difference is its free availability, allowing a large number of people to try it out.
The best AI-based virtual assistants allow conversation threads to be conducted with context maintained. In the case of ChatGPT, it allows the user to continually refine a response if it initially lacks accuracy in some respect. For example, a recent research paper—“An Analysis of the Automatic Bug Fixing Performance of ChatGPT”—applied ChatGPT to fixing bugs in a benchmark set, QuixBugs, which contains 40 examples of code snippets containing an error. It compared ChatGPT’s performance to a popular software tool used by coders, which does not rely on AI algorithms, standard automated program repair (APR), as well as two deep learning–based tools Codex and CoCoNut. The ChatGPT results were similar to Codex and CoCoNut and superior to APR. Figure 1 shows the results of four independent runs of ChatGPT on QuixBugs.
Figure 1: Number of occurrences of identified classes of ChatGPT, answers over four independent runs of QuixBugs benchmark set (4 x 40 = 160 total tests) Source: Dominik Sobania, et al., “An Analysis of the Automatic Bug Fixing Performance of ChatGPT”
ChatGPT performed better than non-AI techniques and shined when allowed to run multiple times on a given problem, performing best when asked to repeat a response and given a hint, i.e., when it was told that the function result is not correct and provided an example showing the function not working (these results are not reflected in Figure 1). This shows the power of engaging with the tool in a dialogue.
The AI technology behind OpenAI’s ChatGPT is known as generative pre-trained transformers (GPT), which exploits deep neural networks in multiple blocks of transformer architecture. Training of the neural networks involves two stages: an unsupervised generative stage for setting initial parameters and a supervised stage trained on a specific target. ChatGPT is based on a model drawn from a series of models known as GPT-3.5, which in turn is based on GPT-3, a model with 175 billion parameters.
Microsoft, which has invested significantly in OpenAI and has first options on its technology, has added ChatGPT capabilities into its Power Platform, which is its NCLC solution. This includes updating its data science tool AI Builder (part of the Power Platform) with ChatGPT capabilities.
GitHub (owned by Microsoft) first released a tool, Copilot, in October 2012, the result of a collaboration between GitHub and OpenAI. OpenAI developed and continues to develop Codex, currently available through an API in private beta (first released in October 2021). GitHub uses Codex plus additional code from its repository to train Copilot. Copilot is a code autocompletion tool for users of IDEs Visual Studio Code, Visual Studio, Neovim, and JetBrains. It autocompletes a block of code rather than just completing a single line of code. GitHub conducted research on developer experience with Copilot and showed a positive response (see Further reading), but for an independent view, the research by Arghavan Moradi Dakhel, et al. (see Further reading) is useful. Figure 2 shows the conclusions from the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems and (ii) comparing Copilot’s proposed solutions with those of human programmers (students) on a set of programming tasks. It concludes that Copilot has limitations on complex programming tasks but is useful for generating preliminary solutions for basic programming tasks.
Developers talk of Copilot saving time from having to search for information, but its output needs to be evaluated by an experienced professional. The same caution applies to using ChatGPT for autocoding: it is best working as an assistant and not a substitute for a programmer.
It is important to understand that generative AI applied to coding provides its usefulness by virtue of mining existing know-how (the code base it is trained on), and most programming requirements are just the reuse of code snippets with minor changes, perfect for automating. But don’t ask ChatGPT to create something truly novel that has never been done before—an extreme example is to ask ChatGPT to produce an artificial general intelligence algorithm. The output will be nonsense. This issue does not apply to subjective creative arts, where generative AI has been used to create novel art, albeit typically in the style of some artists.
There are legal challenges to GitHub Copilot by Matthew Butterick and the Joseph Saveri Law Firm and against AI art tools Midjourney, Stability AI, and DeviantArt by the same parties as well as Getty Images against Stability AI. The claims are that using openly available data on the internet, such as code on GitHub and artists’ works, to train AI systems violates the copyright on the original works. The defendants are GitHub, Microsoft (owner of GitHub), and OpenAI (Microsoft is its largest investor). The Copilot case is due to be heard in May 2023. These cases are creating some uncertainty and will be watched keenly.
According to media reports, Matthew Butterick is motivated by the belief that a tool like Copilot will make programmers redundant. Omdia (as do the tool creators) believes this view is misplaced. When used correctly, Copilot and similar autocode tools can improve developer productivity and improve code quality, and given the endemic bugs issue that plagues software today, autocoding is promising to be a step change in improving software generally.
There is a growing market for AI-assisted autocoding tools. Table 1 below lists some of the notable ones.
Table 1: Key players in the AI-assisted autocoding market
Carnegie Mellon University
Free open source
Beta in 2020
Plans: Free, Enterprise
Plans: Individuals, Business
PaLM and MakerSuite
March 14, 2023
Active on GitHub
Free open source
Plans: Free, Hacker, Pro
Research. Available as a demo for Apex users.
Plans: Free, Team, Enterprise
Founded in 2012 as Codota
Plans: Starter, Pro, Enterprise
Autocomplete plugin for Vim
Active on GitHub
Free open source
This report draws on the author’s extensive knowledge in the field of AI, as well as drawing on expert knowledge from the Omdia AI market research team.
Arghavan Moradi Dakhel, et al., GitHub Copilot AI pair programmer: Asset or Liability?, arXiv (arXiv:2206.15331 [cs.SE]) (June 30, 2022)
Dominik Sobania, et al., An Analysis of the Automatic Bug Fixing Performance of ChatGPT, arXiv (arXiv:2301.08653v1 [cs.SE]) (January 20, 2023)
Generative AI: Market Landscape 2023 (March 2023)
Hammond Pearce, et al., Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions, arXiv (arXiv:2108.09293v3 [cs.CR]) (December 16, 2021)
Owura Asare, Meiyappan Nagappan, N. Asokan, Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?, arXiv (arXiv:2204.04741 [cs.SE]) (February 14, 2023)
“Research: quantifying GitHub Copilot’s impact on developer productivity and happiness,” GitHub (retrieved on March 16, 2023)
Michael Azoff, Chief Analyst, Cloud and Data Center Practice