Find Python Code On GitHub With Gistable
Written by Kay Ewbank   
Friday, 07 September 2018

Researchers have put together a database of Python code snippets on GitHub. Gistable lists over 10,000 Python code snippets, of which around half come with a Dockerfile to configure and execute them.

The database was developed on the basis of research carried out by a team from North Carolina State University, who were interested in the executable status of Python code snippets shared on GitHub.

The researchers wanted to to know what percentage of code shared through GitHub's gist system would just work, and how much would require 'non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration'.

The problem, of course, is that code snippets can contain parse errors, or fail to execute if the environment contains unmet dependencies.

The researchers found that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. The study also suggests that:

"the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time."

The researchers scraped gist URLs from the GitHub gist UI, and collected an initial dataset of 10,259 gists containing over 1,700 unique third-party library packages. These were then cloned and executed inside of a Docker container based on the official Python image for Docker, categorizing the gist by its exit status.

Less than 25% of gists were executable by default, with over half failing due to ImportError in Python 2. Of the gists which initially failed withImportError, attempts to infer an environment specification worked less than 50% of the time.

While the researchers were mainly interested in investigating the state of online code, out of it they developed the database Gistable.  The idea is that this is an extensible framework that can be used for reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error.

githubdeklogo

More Information

Gistable On GitHub

Research Abstract On Arxiv

Related Articles

GitHub Security Alerts For Python 

Python The Future Of Programming?

GitHub Adds Security Alerts 

GitHub For Unity Now Available

Microsoft Buys GitHub - Get Ready For a Bigger Devil

GitHub Marketplace Now Accepts Free Apps and Offers Free Trials

GitHub Enterprise Adds Team Discussions

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Sequin - Open Source Message Stream Built On Postgres
31/10/2024

Sequin is a tool for capturing changes and streaming data out of your Postgres database, guaranteeing exactly once processing. What does that mean?



Microsoft Introduces Vector Abstractions Library For .NET
21/11/2024

Microsoft has announced a preview release of the Microsoft Extensions VectorData Abstractions library, which can be used to help integrate vector stores into .NET applications and libraries.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 10 September 2018 )