Too Much Py In PyPI
Written by Mike James   
Wednesday, 31 July 2019

There is a move to reduce the amount of code in the Python standard library - to remove the dead or dying batteries. The suggestion is that the PyPI package library could take the strain. New research suggests that there might be some difficulties in this.

Python is often billed as the "batteries included language". The reason is that it has a standard library which includes many of the features you need to get started on a project. This has many advantages. For one you don't need to look around to discover what module is best to use for some task. If it is in the standard library then you just use it and stick with it, unless there is a good reason to swap. This means that most Python programmers will use the same module to do the same task and this means that there is an increasing wealth of community knowledge on how to get things done.

It sounds great, but apparently the effort needed to maintain the standard library is proving too much and there are "dead batteries" as well as live ones. The suggestion is that the standard library should be slimmed down and the removed modules should make their way to a repository such as PyPI.

That's a terrible idea.

pypisq

Some research by Ethan Bommarito and Michael J Bommarito II at the University of Michigan give you some idea what a sprawling mess PyPI actually is:

Statistic
Number of packages 178,952
Number of releases 1,745,744
Number of package classifications 947,896
Number of authors 76,997
Number of maintainers 3,047
Number of licenses 4,610
Number of imports 156,816,750

There are a total of 2.4TB of release packages and it is growing fast. The annual compound growth rate is reported to be 47.31% for active packages and the number of authors is growing by 39.3%.

This is even more interesting:

We find that most packages and releases are quite small, with a median release size of 22.6KiB and a median package size of 40.0KiB. However, right-skew again appears, with the largest releases weighing in at nearly 600 megabytes and the largest packages using nearly 175 gigabytes. Together, the four “deep learning” packages tf-nightly, mxnet-cu100mkl, mxnet-cu100, and tf-nightly-gpu use approximately 500 gigabytes of PyPI storage - nearly 25% of all PyPI storage.

Also interesting is the range of licences:

License Family %
MIT 60%
GPL 16%
BSD 11%
Apache 8%

with 27% assigned as "unknown" and 0.1% proprietary licences. Clearly which package you use is going to depend on licence and you are going to have to inquire before you use it.

Another interesting statistic is that only 24% of modules are regarded as "production ready/stable". The majority, 28% are in beta and 22% are in alpha. Of course what these terms mean is very variable. A well implemented alpha might be preferable to a not so stable production ready package.

So is PyPI an alternative to "batteries included"? It is difficult to see how it could be. If you search for a package on PyPI you most likely get several hundred results and the question is which one to use? For example, the standard library has a basic web server which is very useful for small projects and IoT applications in particular. If you search for "web server" on PyPI you get 10,000 hits with no clue as to what is "recommended". If you filter by "production/stable" you still get 4,228 hits. If you narrow it down to HTTP servers you get a much better 36 hits, but still with no way to tell which one is worth your time.

Batteries included is a much better way than this free-for-all.

One possible solution is for the standard library to be extended by accredited modules in the PyPI library. How to do the "accrediting" is a problem to be solved.

pypisq

More Information

 An Empirical Analysis of the Python Package Index (PyPI)

Related Articles

PyPI Increases Security

PyPI Granted $170,000

Python - Dead Batteries Included? 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


100 Episodes of 5mins of Postgres
08/03/2024

The popular PostgreSQL explainer series is celebrating its 100th release and beyond. Let's take a look at what it makes it so special.



Couchbase Adds Vector Search
07/03/2024

Couchbase is adding support for vector search across its entire product line including Capella, Enterprise Server, and Mobile. Support has also been added for retrieval-augmented generation (RAG) tech [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

<ASIN:1871962587>

<ASIN:B07S1K8KLW>

 

Last Updated ( Wednesday, 31 July 2019 )