Gender Differences In Coding Style
Written by Sue Gee   
Wednesday, 13 November 2024

A novel investigation into the gender gap between men and women regarding coding ability was undertaken by Dr Siân Brooke. Her conclusion? There is a difference in the Python code produced by men and women, but it is to do with coding style and doesn't affect code quality.

Dr Siân Brook was affiliated to the Department of Methodology at the London School of Economics (LSE) and her paper, published in the February 2024 issue of the Journal of Computer-Mediated Communication is titled: Testing for Gender Differences in Python Programming Style and Quality on GitHub. It reports the results of an analysis of open source projects looking for differences between four gender categories: feminine, masculine, ambiguous, where gender could not be determined and anonymous where gender was deliberately concealed.
 
To make the topic more attention-grabbing, its abstract opens with a controversial statement:
 
The underrepresentation of women in open-source software is frequently attributed to women’s lack of innate aptitude compared to men: natural gender differences in technical ability.
 
This finding is attributed to Trinkenreich et al., 2021, a literature survey of 51 articles published between 2000 and 2021 that investigated women’s participation in OSS without making any comment about aptitude for programming in its abstract. Despite my misgivings that Brooke is being an agent provocateur, there is no denying that there is a very noticeable gender imbalance when it come to the number of programmers employed in major tech companies, as shown here in a screen from Brooke's video "Why are there so few women in tech?"
 
woman banner
 
In looking for projects to include Brooke used the following criteria:
  • Owned by a sole user and not an organisation
  • The owner is the only contributor.
  • Code written in Python3. 
  • The repository was created on or after January1, 2019.
  • Not a fork and not marked as private or archived
  • Repository is between 0.01 and 1 gigabyte in size
  • Repo has a minimum of 10 starts and 10 forks.

In total 1,728 repositories consisting of 30,198 modules/“.py” files were included with this distribution across gender categories:

woman table

Brooke analyzed the Python code by identifying subtle programming errors and unconventional coding practices using linting and also assessed adherence to style guidelines. She also measured gender differences in modular programming by focusing on the building blocks of code in the content of lines in Python files. 

The first set of results reported in the paper relate to module structure and include the following findings:
  • Masculine GitHub users have more lines of code on average than other gender groups, followed by feminine users.
  • A significant gender difference in the count of comments and docstrings lines between gender groupings, with masculine users having more lines of docstring on average.
  • A significant difference in gender groupings between the average number of methods and classes in Python modules with masculine users utilizing methods the most frequently and feminine the least. There is less variation by gender for functions and classes, but anonymous and ambiguous users employ classes more regularly than gender-identified (masculine, feminine) users. The most minor variation between gender groups is with functions, with masculine and feminine users having a relatively higher average count. These findings could suggest that users whose gender is known take a more functional approach to programming in comparison to anonymous and ambiguous users, who use more classes. 

The second research question asked about a gender difference in Pylint scores and style checker components and here Brooke reported "a nuanced response" that can be summarised as:

no significant gender difference in the overall quality of Python code. However, there is evidence of a gender difference in style... which indicate that it is feasible to predict the gender of users from the style of their code.

In the final element of the analysis Brooke investigated whether the structure and style of modules could be used to classify users’ inferred gender. She used a Random Forest model to predict gender identity, using structural constituents and style checker components of Python modules and concluded that:
 
the gender of Python modules can be predicted based on programming style.
 

You can read the paper for some of the finer details and for an overview here's a video in which tackles the issue of the perceived gender gap in coding ability and provides her response based on her research: 

One point that emerges from the video is that the research shows that feminine and anonymous users have comparable coding styles, which Brooke claims implies that feminine users may choose to be anonymous on technical platforms in order to avoid harassment and having their work devalued. 

Her conclusion in the video is that:

to see change we need to see social solutions
like actively calling out sexism, actively hiring women and inclusive practices in programming education all the way through. It is very important that we get ahead of this very singular view that excludes women and minorities from participating in technology.

 

woman

 

More Information

Brooke,S. (2023). Testing for Gender Differences in Python Programming Style and Quality on GitHub

Related Articles

Women In Tech - Towards Gender Parity

Celebrating International Women's Day 2016

Women In Computing (2014) 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


GitHub Announces Free Copilot
19/12/2024

GitHub has launched GitHub Copilot Free, a free version of Copilot that provides limited access to selected features of Copilot and is automatically integrated into VS Code. The free tier is aimed at  [ ... ]



.NET Community Toolkit Adds Partial Properties Support For MVVM
19/12/2024

Microsoft has announced version 8.4 of the .NET Community Toolkit, a collection of helpers and APIs that work for all .NET developers. The new version adds support for partial properties for the MVVM  [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 13 November 2024 )