Python Now Uses Powersort
Written by Mike James   
Wednesday, 21 December 2022

Sorting algorithms - surely nothing new? However Python 3.11 has adopted a new sorting algorithm that is better than TimSort.

We all love sorting algorithms because it's where most of us cut our algorithmic teeth. It is also the cause of a deep intake of breath when you first meet the inscrutable Quicksort and conclude that surely nothing could be better. Well the truth is that practical sorting isn't typified by theoretical purity. The fact that Quicksort is on average nlogn isn't as important as how bad it can be on some inputs. Practical sorting algorithms are designed to work well, no matter what you give them to sort and are generally mixtures of other well known sorting methods.

pfsbanner

For example, Python's standard sorting method up to Python 3.11 was TimSort, invented and implemented by Tim Peters in 2002. It works by identifying runs where the data is already in order and then merging them into larger runs. The algorithm is more complicated than you might expect and it achieves a worst case performance of nlogn and a best case performance of n. TimSort is in use in Java, Android, GNU Octave, JavaScript V8, Swift and Rust - but now Python has abandoned it for something better. Well, a little bit better.

Two CS researchers at the University of Waterloo, Canada, Ian Munro and Sebastian Wild were studying the merge policy used by TimSort and noticed that it wasn't optimized. In fact it was poorly understood. Eventually they found that a known, but not well known, algorithm about searching binary trees provided an optimal solution to the merge order problem and thus Powersort was born. As this is TimSort with an optimal merge order, it is fitting that  Tim Peters was responsible for implementing it in Python. The changelog reads:

List sorting now uses the merge-ordering strategy from Munro and Wild’s powersort(). Unlike the former strategy, this is provably near-optimal in the entropy of the distribution of run lengths. Most uses of list.sort() probably won’t see a significant time difference, but may see significant improvements in cases where the former strategy was exceptionally poor. However, as these are all fast linear-time approximations to a problem that’s inherently at best quadratic-time to solve truly optimally, it’s also possible to contrive cases where the former strategy did better.

So now you know - when you use list.sort() it's Powersort that is doing the work. Will Java and the others follow? A good question that emphasizes the fact that Python is still quick on its feet as far as new developments are concerned. However, things never stand still and Sebastian Wild, who is now at the University of Liverpool, has just completed work on an improved version of Powersort that uses a four-way merge.

Who would have thought that there was so much life in the venerable topic of sorting?

python3

More Information

Powersort in official Python 3.11 release

Related Articles

Python 3.11 Released

Magic of Merging

QuickSort Exposed

Sorting Algorithms as Dances      

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


DuckDB And Hydra Partner To Get DuckDB Into PostgreSQL
11/11/2024

The offspring of that partnership is pg_duckdb, an extension that embeds the DuckDB engine into the PostgreSQL database, allowing it to handle analytical workloads.



Prompt Engineering Techniques To Make You An Expert
18/11/2024

Introducing a GitHub repository full of hot tips and instructions on how to build the perfect prompt presented in a collection of Jupiter Notebooks.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

<ASIN:1871962749>

<ASIN:B0B5522QS3>

<ASIN:1871962765>

<ASIN:1871962595>

 

 

 

 

Last Updated ( Friday, 23 December 2022 )