Take The ETH Zürich Big Data Course For Free
Written by Nikos Vaggalis   
Friday, 28 October 2022

A great course on everything Big Data taught at ETH Zürich University by Professor Ghislain Fourny. The recorded lectures of fall 2021 are up on Youtube for everyone to enjoy.

The notion behind Big Data that this course adopts is that :

Information society has to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.

The course revolves around the database technologies and the most important database design principles that lay the foundations of the Big Data universe. These are distributed storage, the syntax, models, validation, processing, indexing, and querying, all fitted to the Big Data model. Looking more into them we find that they are expanded into :

  • physical storage: distributed file systems (HDFS), object storage(S3), key-value stores

  • logical storage: document stores (MongoDB), column stores (HBase), graph databases (neo4j), data warehouses (ROLAP)

  • data formats and syntaxes (XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro)
  • data shapes and models (tables, trees, graphs, cubes)

  • type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, )

  • an overview of functional, declarative programming languages across data shapes (SQL, XQuery, JSONiq, Cypher, MDX)

  • the most important query paradigms (selection, projection, joining, grouping, ordering, windowing)

  • paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark)

  • resource management (YARN)

  • what a data center is made of and why it matters (racks, nodes, . . . )

  • underlying architectures (internal machinery of HDFS, HBase, Spark, neo4j)

  • optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing)

  • applications

Subsequently, those subjects are spread out in a 40 recorded lectures, each video up to 45 minutes in length :

  • Introduction
  • Lessons learnt (1/2/3)
  • Object storage (1/2/3)
  • Distributed file systems (1/2/3)
  • Syntax (1/2/3)
  • Wide column stores (1/2/3)
  • Data models (1/2/3)
  • Massive Parallel Processing I MapReduce (1/2)
  • Resource management (1/2)
  • Massive parallel processing II: Spark (1/2/3/4)
  • Performance at large scales (1/2)
  • Document stores (1/2/3/4)
  • Querying trees (1/2/3/4)
  • Graph databases (1/2/3)
  • Data warehouses and cubes (1/2/3)
  • Wrap up (1/2)

All the videos are very interesting but I particularly liked the series on "Wide column stores". 

There also "Big Data for Engineers" being taught which is similar to Big Data, but it's adapted for non Computer Scientists. Big Data is addressed purely to Computer Science students.

In the end watching through should have gained you an overview and understanding of the Big Data landscape.Armed with this knowledge you should be able to make informed decisions addressing any of your projects' needs.

 ethlogo

More Information

YouTube playlist

Related Articles

Brand New Data Science Courses on edX

OS-Climate - Open Source To Tackle Climate Change

Google's Cloud Spanner To Settle the Relational vs NoSQL Debate?

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Mastering LLMs With Experts
22/10/2024

A freely available set of workshops and talks on the essentials of LLMs, taught by practitioners. The topics include Evals, Retrieval-augmented-generation (RAG), Fine-tuning etc.



JetBrains Improves Kubernetes Support In IDE Upgrades
12/11/2024

JetBrains has improved its IDEs with features to suggest the logical structure of code, to streamline the debugging experience for Kubernetes applications, and provide comprehensive cluster-wide Kuber [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 02 November 2022 )