Spark 3 Improves Python and SQL Support
Written by Kay Ewbank   
Monday, 22 June 2020

Spark 3 has been released with major improvements in Python and SQL support, along with changes to make it easier to explore data. It is also faster.

Spark is the general-purpose cluster computing framework that has native support for distributed SQL and enables streaming, graph processing, and machine learning.

sparklogo

Spark SQL underlies many of the actions carried out by developers using Spark, and nearly half the work in the updated release has been to the SQL engine, improving both performance and ANSI compatibility.  The SQL engine has a number of new features, including an Adaptive Query Execution (AQE) framework that improves performance and simplifies tuning by generating a better execution plan at runtime. SQL higher-level libraries have also been improved, including structured streaming and MLlib, and higher level APIs, including SQL and DataFrames.

Python is now the most widely used language on Spark and has received a lot of attention in this release, especially pandas and Koalas. The pandas API is limited to single-node processing, and work has continued on Koalas, an implementation of the pandas API on top of Apache Spark, to make data scientists more productive when working with big data in distributed environments. Koalas eliminates the need to build many functions in PySpark, to make performance across clusters more efficient.  The Koalas API coverage for pandas is now close to 80%.

Work has also continued on the PySpark APIS. There are new pandas APIs with type hints. The new pandas UDF interface uses Python type hints to make it easier to understand when more UDF types are added.  Error handling has ben improved, simplifying PySpark exceptions, hiding the unnecessary JVM stack trace, and making them more Pythonic.

Other improvements include a new UI for structured streaming, and faster calling (up to 40 times faster) for R user-defined functions.

 

sparklogo

 

More Information

Spark 3 Release Notes 

Related Articles

Have Your Say On .NET For Spark  

Visual Spark Studio IDE For Spark Apps

.NET Is One With .NET 5

Spark BI Gets Fine Grain Security

Spark Announcements

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


TestSprite Announces End-to-End QA Tool
14/11/2024

TestSprite has announced an early access beta program for its end-to-end QA tool, along with $1.5 million pre-seed funding aimed at accelerating product development, expanding the team, and scaling op [ ... ]



JavaZone - The Conference We Missed
25/10/2024

Amongst the many Java related conferences, this one flew under the radar. A real shame because it had many great sessions.
JavaZone might not be that famous internationally, but it still is the bi [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 22 June 2020 )