Google Slashes Code Migration Time With Gemini
Written by Sue Gee   
Wednesday, 22 January 2025

Google computer scientists have given details of the way in which Google is using AI to dramatically reduce the time required for code migrations. In the case of a switch between two Java time libraries, there was an estimated time saving of 89 percent compared to doing the job manually.

Google has positioned itself as an "AI-first" company, emphasizing the importance of AI in all aspects of its business.  Accordingly has been a pioneer in adopting LLM-based AI assistants internally, in particular Gemini, to enhance productivity and efficiency across its workforce.

Now it has shared details of this experience in a preprint paper in arxiv, titled How is Google using AI for internal code migrations? 

The introduction to the paper refers to the software development challenges facing Google Product Areas such as Ads, Search, Workspace and YouTube, in common with many Fortune 500 companies that have large, mature(20+ years) code bases which include the need to maintain code and use new frameworks etc. It also explains that Google uses AI technologies in software engineering internally at two levels:

1) Generic AI-based tooling for software development that is designed for all Googlers across all its Products Areas, These "built by Google for Google" technologies include code completion and code review.

2) Solutions for specific Product Areas in which Google has used LLMs in custom (or “bespoke”) ways. Examples include specific code migrations, code efficiency optimization, and test generation tasks.

The paper states:

As opposed to mass market, or generic tools, the number of interactions may be smaller for bespoke tools, but the complexity of each interaction is often higher.

Having looked briefly at the role of generic AI tools in Google internal software, the bulk of the paper concerns bespoke use of LLMs for code migration and discusses three distinct code migration exercises in which its LLM-based tooling was employed.

The first involved a codebase of 50+M lines for Google Ads and the task was to change 32-bit IDs to 64-bit IDs which existed in tens of thousands of code locations across thousands of files. According to the paper this was not a trivial task as the IDs were often generically defined (int32_t in C++ or Integer in Java) and were not easily searchable.

"The full effort, if done manually, was expected to require hundreds of software engineering years and complex crossteam coordination." 

The LLM-based workflow has three stages:

  1. An expert engineer from Ads finds the IDs they want to migrate and, using a combination of Code Search, Kythe  and custom scripts.
  2. An LLM-based migration toolkit, prompted by an expert, runs autonomously and produces verified changes that only contain code that passes unit tests. When necessary, tests are also updated to reflect the changes to the code.
  3. The same engineer then manually checks the change and potentially updates files where the model failed or made a mistake. The changes are then sharded and sent to multiple reviewers who own the part of the codebase affected by the change,

As a result of using this workflow, 80% of the code modifications in the change lists were purely the product of AI, specifically Gemini. 

As stated in the paper, 

"in most cases, the human needed to revert at least some changes the model made that were either incorrect or not necessary. Given the complexity and sensitive nature of the modified code, effort has to be spent in carefully rolling out each change to users." 

Even with the need to double-check Gemini's work, the time required to complete the migration was estimated to be reduced by 50 percent.

The second migration exercise was that of converting from the old JUnit3 testing library to JUnit4 in a huge Google codebase. Although for a human this would be relatively simple, doing so with purely AST-based techniques was deemed infeasible due to there being many edge cases.

The prompt given Gemini for this task was:prompt

With assistance from a version Gemini that was fine-tuned on the internal Gogle codebase and had therefore seen some JUnit4 tests, it took just three months to migrate 5,359 files and modify 149,000 lines of code to complete this transition. Approximately 87 percent of the code generated by Gemini ended up being committed with no changes.

A further task on the codebase was to replace thousands of occurrences of the outdated Joda time library by Java's standard java.time package.

One major challenge in such a migration is that the changes are not scoped to singular methods but very often require changes in class public interfaces and fields. The situation becomes even more complex as we cannot just create a giant change and update all occurrences.

An initial step in the workflow was a clustering solution using Kythe categorizing the potential changes. The cross-references form directed acyclic graphs (DAGs) connecting files. The call-graphs tend to cluster and we split them into categories by number of files affected. The clustering is important to also get the model to make consistent changes across files. When there is a dependency, ideally all files should be migrated in one prompt as, were they split up, Gemini effectively would not know, between inference invocations, if the referenced file was migrated or not. The alternative is to show the already-migrated file to the model to avoid inconsistencies. Fortunately, as Gemini offers a huge context window it was possible to fit many of the clusters into the context window. This made it possible to prompt once for many files and get the whole cluster migrated. After  code changes builds and tests were run enabling any failures to be fixed.

Long story short, for this switch, the authors estimate a time saving of 89 percent compared to the projected manual change time.

As a conclusion to these code migrations the report states:

"LLMs offer a significant opportunity for assisting, modernizing and updating large codebases. They come with a lot of flexibility, and thus, a variety of code transformation tasks can be framed in a similar workflow and achieve success. This approach has the potential to radically change the way code is maintained in large enterprises. Not only can it accelerate the work of engineers, but make possible efforts that were previously infeasible due to the huge investment needed."

 ggemini logo

More Information 

How is Google using AI for internal code migrations? by Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Satish Chandra, Siddharth Taneja, and Celal Ziftci

Related Articles

The Single Issue Of 2025 - AI

Google Releases Gemini 2 And Jules Code Agent

Gemini Offers Huge Context Window

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Android Studio Ladybug Adds Gemini Interactions
20/01/2025

Google has announced that the latest 'feature drop' version of Android Studio, Ladybug is now stable. The new version includes ways to interact with Gemini in Android Studio, Animation Preview support [ ... ]



Express.js 5 Released With Greater Security
16/01/2025

Express.js 5 has been released, ten years after Express.js 4. The new release has dropped support for outdated versions of Node.js, addresses security concerns, and brings simplified maintenance.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 22 January 2025 )