MOOC On Apache Spark
Written by Alex Denham   
Thursday, 28 May 2015

If you are want to apply data science techniques using parallel programming, in Apache Spark, you'll be interested in an edX course starting Monday June 1st that prepares you for the Spark Certified Developer exam.

spark

CS 100.1x Introduction to Big Data with Apache Spark is a 5-week course at Intermediate level under the auspices of UC BerkeleyX, Berkeley's online course outfit, and sponsored by Databricks, a company founded by the creators of Apache Spark.

It will be taught by Anthony D Joseph who is both Professor in Electrical Engineering and Computer Science and Technical Adviser at Databricks.

With a required effort of 5-7 hours per week (around 30 hours in total) students will learn:

  • Learn how to use Apache Spark to perform data analysis

  • How to use parallel programming to explore data sets

  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions

  • Prepare for the Spark Certified Developer exam

The Spark Certified developer exam is offered by Databricks in conjunction with O'Reilly at a cost of $300. It can be taken in person during sessions at Strata events or online from you computer.

This certification enables you to:

 

  • Demonstrate industry recognized validation for your expertise.
  • Meet global standards required to ensure compatibility between Spark applications and distributions.
  • Stay up to date with the latest advances and training in Spark.
  • Become an integral part of the growing Spark developer community.

Of course you don't have to take this certification and can use this MOOC, simply to extend your knowledge of data science. It is part of a two-module Big Data XSeries with the other module being CS 190.1x: Scalable Machine Learning which starts on June 29.

 

cs10001x

According to its rubric:

This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

Because all exercises will use PySpark (part of Apache Spark) you either need expereience with Python or to take a free online Python mini-course supplied by UC Berkeley.

 

 

 

 

 

Banner


Unitree G1 - See How It Runs
26/01/2025

Chinese robotics company Unitree has made a significant breakthrough with its G1 humanoid robot which walks and runs in a convincingly natural way. This is thanks to its advanced hip joint design. See [ ... ]



Android Studio Ladybug Adds Gemini Interactions
20/01/2025

Google has announced that the latest 'feature drop' version of Android Studio, Ladybug is now stable. The new version includes ways to interact with Gemini in Android Studio, Animation Preview support [ ... ]


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 17 August 2016 )