Pandas Reaches 1.0

Written by Kay Ewbank

Tuesday, 28 January 2020

Pandas, the data analysis library for Python, is now available as a version 1.0 release candidate. It features the addition of a new value to represent scalar missing values and a dedicated string data type.

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It was designed to provide developers with an easy way to work with structured data such as tables, matrices and time series. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

pandas

One of the major changes to Pandas is the version of Python it supports and expects. The new release no longer supports any version of Python earlier than Python 3.6.1. The previous release had already dropped support for Python 2, so this isn't as major a change as it sounds.

A change that improves working with missing data is the addition of a new value to represent scalar missing values. Until now, there were several options for doing this depending on the type of data - np.nan for float data, none for object-dtype, and pd.NaT for date-times. The new value, pd.NA, is used to provide a “missing” indicator that can be used consistently across data types. pd.NA is currently used by the nullable integer and Boolean data types and the new string data type.

Which leads us neatly on to the next improvement, the addition of a dedicated string data type. StringDtype is an extension type dedicated to string data. Until now, strings were typically stored in object-dtype NumPy arrays. The developers say the 'string' extension type solves several issues with object-dtype NumPy arrays, so you won't be able to accidentally store a mixture of strings and non-strings in an object dtype array as a StringArray can only store strings. In addition, the new type won't break dtype-specific operations like DataFrame.select_dtypes() in the same way that object dtype does.

Elsewhere, the developers have added a way to define custom windows for rolling operations. You can now define how window bounds are created during rolling operations. Users can define their own get_window_bounds method on the subclass pandas.api.indexers.BaseIndexer() that will generate the start and end indexes used for each window during the rolling aggregation.

pandas

More Information

What's New In Pandas 1.0

Pandas for Everyone: Python Data Analysis (Book Review)

Too Much Py In PyPI

Python 3.8 Adds Walrus Operator

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Alan Turing's Papers Raise A Fortune
23/06/2025

Because so much of his work was top secret, Alan Turing was very much an unsung hero during his lifetime. Recognition of his many achievements dawned gradually and now his reputation is worldwide [ ... ]

+ Full Story

Student’s Robot Smashes 4x4 Rubik’s Cube World Record
13/06/2025

Matt Pidden, a computer science student at the University of Bristol, UK, has broken the world record for solving a 4x4 Rubik's Cube using a robot he designed, built and trained in just 15 weeks.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 28 January 2020 )

Related Articles

Comments