MOOC On Apache Spark

Written by Alex Denham

Thursday, 28 May 2015

If you are want to apply data science techniques using parallel programming, in Apache Spark, you'll be interested in an edX course starting Monday June 1st that prepares you for the Spark Certified Developer exam.

spark

CS 100.1x Introduction to Big Data with Apache Spark is a 5-week course at Intermediate level under the auspices of UC BerkeleyX, Berkeley's online course outfit, and sponsored by Databricks, a company founded by the creators of Apache Spark.

It will be taught by Anthony D Joseph who is both Professor in Electrical Engineering and Computer Science and Technical Adviser at Databricks.

With a required effort of 5-7 hours per week (around 30 hours in total) students will learn:

Learn how to use Apache Spark to perform data analysis
How to use parallel programming to explore data sets
Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
Prepare for the Spark Certified Developer exam

The Spark Certified developer exam is offered by Databricks in conjunction with O'Reilly at a cost of $300. It can be taken in person during sessions at Strata events or online from you computer.

This certification enables you to:

Demonstrate industry recognized validation for your expertise.
Meet global standards required to ensure compatibility between Spark applications and distributions.
Stay up to date with the latest advances and training in Spark.
Become an integral part of the growing Spark developer community.

Of course you don't have to take this certification and can use this MOOC, simply to extend your knowledge of data science. It is part of a two-module Big Data XSeries with the other module being CS 190.1x: Scalable Machine Learning which starts on June 29.

cs10001x

According to its rubric:

This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

Because all exercises will use PySpark (part of Apache Spark) you either need expereience with Python or to take a free online Python mini-course supplied by UC Berkeley.

More Information

Introduction to Big Data with Apache Spark

What is a Data Scientist and How Do I Become One?

Learning Spark

Training To Advance Your Career

Udacity Introducing Big Data Courses

Coursera Offers CS Specialization Certificates

Data Mining Specialization Starts Today

MIT Professional Education MOOC on Big Data

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.

Scylla Launches ScyllaDB X Cloud
19/06/2025

The developers of ScyllaDB have announced an updated version of the managed version of its database that is aimed at meeting workloads based on demand.

+ Full Story

Apple's Swift Is Coming To Android
27/06/2025

Swift has long lost its position as a proprietary language, but what could Apple be thinking as it makes its move to the Android platform?

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 17 August 2016 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments