Talking About Languages
Written by Janet Swift   
Wednesday, 27 August 2014

An analysis of data from programming language subreddits reveals some insights into how programmers feel about the languages they use and the overlap between languages.

Project author, Tobias Hermann (Dobiasd on GitHub) says:

While reading about various programming languages, I developed a hunch about how often different languages are mentioned by other communities and about the average conversational tones used by relative members.

To test his hunch he collected and analysed all comments (about 300k) written to submissions (about 40k) in programming language subreddits from August 2013 to July 2104 using SQLite and PRAW, a Python package that gives access to reddit's API.

The twenty-two languages covered are:

Clojure, C++, C#, Go, Haskell, Java, JavaScript, Lisp, Lua, Mathematica, Matlab, Objective-C, Perl, PHP, Python, Ruby, Rust, Scala, SQL, Swift, Visual Basic

One interesting analysis was to discover how much a language is talked about a language compared to how much it is supposedly used according to the TIOBE index:

 langmentions

I guess this illustrates what we all suspect - that Haskell is talked about more often than it is used. There might also be a "name droping" phenomena going on as well and perhaps a bit of Visual Basic shame. 

The main achievement of the project is an interactive visualization of  Mutual Mentions which shows how often a programming language is mentioned in communities (subreddits) not belonging to them. It is a Chord graph built with D3.js, a JavaScript library that provides a way of creating documents that are linked to data.

The size of a language is set by how often the others talk about it in sum. One connection represents the mutual mentions of two communities. The widths on each end is determined by the relative frequency of the mentionee being referenced by the respective other community. So PHP talks more about SQL than SQL talks about PHP while Python and PHP discuss each other in a balanced way.

 

 chordphp

 

There are lots of inferences to be drawn from this interactive graph. On Hacker News there's a comment on the way it indicates how compatible or co-used two technologies are in practice.

For example: C++ programmers apparently don't mention SQL at all, while it's very popular with PHP. There is also no overlap between C++ and JavaScript programmers. Rust is obviously very influenced by C++ and Haskell, but the C++ community doesn't even know about its existence. Somewhat naturally, the Matlab and PHP communities really don't have much in common.

Dobiased also look at the choice of words used when referring to programming language. This revealed an obsession with abstract concepts by the Haskell people and the consideration of hardware issues by people using C and C++, that those talking about PHP were most prone to swearing, while those using Lisp and Clojure seemed to be have the most positive attitudes:

 langhappiness

 

At the end of this analysis Dobiased asks:

But what is up with the Visual Basic community? They are neither angry nor happy. They just ... are? :)"

A comment on Hacker news solves the puzzle:

This is answered by the mentions relative to TIOBE graph. They use VB, but they are careful not to talk about it.

 dobiasd1

 

If you click on this chord graph you'll access the interactive version and be able to explore it on one language at a time.

You can also find more of Dobiasd's analysis of Programming language subreddits and their choice of words on GitHub - including an analysis of the swear words most used by each language, I challenge you to guess before you look.

  

Banner


Google Intensive AI Course - Free On Kaggle
05/11/2024

Google is offering a 5-Day Gen AI Intensive Course designed to equip data scientists with the knowledge and skills to tackle generative AI projects with confidence. It runs on the Kaggle platform from [ ... ]



Google Opensources Privacy Library
08/11/2024

Google is making a new differential privacy library available as open source. PipelineDP4J is a Java-based library that can be used to analyse data sets while preserving privacy.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 05 November 2014 )