Windows Source Now In A GIT Repository
Written by Sue Gee   
Friday, 26 May 2017

Microsoft is well on the road to transferring the Windows codebase to a single Git repo hosted on Visual Studio Team Services. This has involved scaling Git to extremely large projects and teams with a project called "Git Virtual File System".

Back in February Microsoft made a surprising announcement. It had decided to use Git, the open source version control system created by Linus Torvalds for the windows codebase and had embarked on GVFS, a set of enhancements to Git, that according to Brian Harry would enable Git to scale to VERY large repos by virtualizing both the .git folder and the working directory.  

Previously Windows, like other Microsoft software used Source Depot, a proprietary version of the commercial Perforce version control system which had been customized to cope with the sheer size of the Windows codebase.

Reporting on the transfer Brian Harry writes on his blog:

As a refresher, the Windows code base is approximately 3.5M files and, when checked in to a Git repo, results in a repo of about 300GB.  Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds.  All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience.  Before the move to Git, in Source Depot, it was spread across 40+ depots and we had a tool to manage operations that spanned them.

The transfer has been done in stages. The first and largest switch in March involved the WindowsOneCore team of 2,000 engineers.

Harry writes:

Those 2,000 engineers worked in Source Depot on Friday, went home for the weekend and came back Monday morning to a new experience based on Git.  People on my team were holding their breath that whole weekend, praying we weren’t going be pummeled by a mob of angry engineers who showed up Monday unable to get any work done.  In truth, the Windows team had done a great job preparing backup plans in case of mishap and, thankfully, we didn’t have to use any of them.

Much to my surprise, quite honestly, it went very smoothly and engineers were productive from day one.  We had some issues, no doubt.  For instance, Windows, because of the size of the team and the nature of the work, often has VERY large merges across branches (10,000’s of changes with 1,000’s of conflicts).  We discovered that first week that our UI for pull requests and merge conflict resolution simply didn’t scale to changes that large.  We had to scramble to virtualize lists and incrementally fetch data so the UI didn’t just hang.  We had it resolved within a couple of days and overall, sentiment that week was much better than we expected.

Another 1,000 engineers were switched a month later, April 22nd, and as this graph show, there was very little impact on the overall pattern of daily check-ins, which can be used as an indicator that people were getting their work done without major hitches. The orange curve shows the Git check-ins increasing over time, replacing the blue SourceDepot ones and the grey line is the total. 

checkins

A further 300-400 engineers moved Git in May leaving about 500 still to be moved. Harry comments:

The remaining teams are currently working to deadlines and trying to figure out when is the best time to schedule their move, but I expect, in the next few months we’ll complete the full engineering team.

He also provides the following statistics to indicate the scale at which the system is operating: 

  • Over 250,000 reachable Git commits in the history for this repo, over the past 4 months.
  • 8,421 pushes per day (on average)
  • 2,500 pull requests, with 6,600 reviewers per work day (on average)
  • 4,352 active topic branches
  • 1,760 official builds per day

 

Technical details of the migration and its scale challenges are discussed by Saeed Noursalehi,  in a document titled Git at Scale. His explanation of what the Git Virtual File System sets out to do and the problems of cloning the Windows repo without it are worth contemplating. He explains that:

GVFS virtualizes the file system beneath a Git repo to solve two main problems: 

    • Only download contents that the user needs
    • Make local Git commands consider just the files that the user cares about, and not all the files that exist in the working directory 

Our target use case is the Windows repo, which has over 3 million files in the working directory, totalling 270GB of source files. That’s 270GB in the working directory, at the tip of master. To clone this repo, you would have to download a packfile that is about 100GB in size, which would take hours. And once you’ve succeeded in cloning it, local git operations like checkout (3 hours), status (8 minutes), and commit (30 minutes) would still take way too long to run, because all of those commands are linear on the number of files. 

In Git at Scale  Principal Program Manager, Visual Studio Team Services gives a full and interesting account of why the project went ahead and how it achieved its aims. In the video below, from Git Merge 2017, Noursalehi discusses how Microsoft is using git internally, with a specific focus on large repositories, and describes the architecture of VSTS’s git server together with the customizations Microsoft has had to make to both it and git.exe in order to enable git to scale further and further.

 

 

GVFS  itself is now an open source project under the MIT licence. Building it requires Visual Studio 2017 Community Edition or higher together with Windows 10 SDK and .Net Framework 3.5 development tools.

 

gitfolders

 

More Information

The largest Git repo on the planet

Scaling GIT

 

Related Articles

Visual Studio To Get Git

GVFS on GitHub

Git at Scale

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


OpenAI Releases Swarm
25/10/2024

OpenAI has released an experimental educational framework for exploring ergonomic, lightweight multi-agent orchestration. Swarm is managed by the OpenAI Solution team, but is not intended to be used i [ ... ]



Extend NGINX With The New JavaScript Module
28/10/2024

Inject middleware functionality into NGINX with the expressive power of Javascript. NGINX JavaScript or NJS for short is a dynamic module under which you can use scripting for hooking into the NGINX e [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 26 May 2017 )