Microsoft's Data Science for Beginners
Written by Nikos Vaggalis   
Monday, 18 October 2021

There's a new free, self-paced online course about Data Science from Microsoft's Azure Cloud Advocates. Its 20 lesson curriculum, expected to take 10 weeks to complete, is targeted at those new to Data Science. Of course, it uses Python.

 

The term Data Science is used to define a vast array of topics. Its definition has become even more broad by overlapping with the sibling fields of  applied mathematics, statistics, machine learning, AI etc.

If asked about the difference between ML and Data Science, I would described ML as being a technology based on data that trains models based on that data in order to tune an algorithm. Data Science is broader than that; it involves collecting, cleaning and aggregating data, visualizing it and using it statistically in order to reach data-driven decisions. However, as we'll discover in this course both fields converge at certain points like data processing or training predictive models.

The definition that this course provides for Data Science is that it encompasses all the following processes:

1. Data Acquisition
The first step is to collect the data. While in many cases it can be a straightforward process, like data coming to a database from web application, sometimes we need to use special techniques.

2. Data Storage
Storing the data can be challenging, especially if we are talking about big data. There are several ways data can be stored:

  • Relational database stores a collection of tables, and uses a special language called SQL to query them.
  • NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs.
  • Data Lake storage is used for large collections of data in raw form.

3. Data Processing
Processing the data from its original form to the form that can be used for visualization/model training. When dealing with unstructured data such as text or images, we may need to use some AI techniques to extract features from the data, thus converting it to structured form.

4. Visualization / Human Insights
Often, data scientist needs to "play with data", visualizing it many times and looking for some relationships. Also, we may use techniques from statistics to test some hypotheses or prove correlation between different pieces of data.

5. Training predictive model
Because the ultimate goal of data science is to be able to take decisions based on data, we may want to use the techniques of Machine Learning to build predictive model that will be able to solve our problem.

The course therefore looks at each of those processes in detail. This includes:

  • Statistics and Probability Theory
    Mean, Variance and Standard Deviation Mode, Median and Quartiles
  • Working with Data
    Relational Databases and their properties of relationships; Retrieving data, Joining data
  • Working with Non-Relational Data
  • Spreadsheets, NoSQL, JSON, Document Data Stores with the Azure Cosmos DB
  • Working with Tabular Data and Dataframes
    Python and the Pandas Library practicing on the real world examples of Analyzing COVID Spread modelling and Analyzing COVID scientific papers
  • Data Preparation and Visualizing with Matplotlib
    Cleaning data, Visualizing Quantities, Visualizing Proportions, Visualizing Distributions
  • Data Science in the Cloud with Azure
    Training models using Low Code tools, Deploying models with Azure Machine Learning Studio
  • Data Science Ethics

The syllabus in detail:

  • Defining Data Science
  • Data Science Ethics
  • Defining Data
  • Introduction to Statistics & Probability
  • Working with Relational Data
  • Working with NoSQL Data
  • Working with Python
  • Data Preparation
  • Visualizing Quantities
  • Visualizing Distributions of Data
  • Visualizing Proportions
  • Visualizing Relationships
  • Meaningful Visualizations
  • Introduction to the Data Science lifecycle
  • Analyzing
  • Communication
  • Data Science in the Cloud
  • Data Science in the Wild

Resources wise, it's pretty much a complete class that it includes nice sketches, supplemental videos quizzes, step-by-step guides on how to build the projects, knowledge checks, challenges and assignments which should be enough to get your journey started.

By way of prerequisites, it's recommended to have a basic understanding of Python, Visual Studio Code and be able to run code in Jupyter Notebooks.

After going through it, you'll be looking for the next steps. A good option on a familiar path is to go with another Microsoft course, "Machine Learning for Beginners" which follows the same structure. 

More Information

Data Science For Beginners

Related Articles 

Microsoft's Machine Learning for Beginners

Fly Over the Moon With Microsoft And Python

Ethics of AI - A Course From Finland 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Apache Fury Adds Optimized Serializers For Scala
31/10/2024

Apache Fury has been updated to add GraalVM native images and with optimized serializers for Scala collection. The update also reduces Scala collection serialization cost via the use of  encoding [ ... ]



Google Releases Gemini Code Assist Enterprise
16/10/2024

Google has released the enterprise version of Gemini Code Assist. This latest version adds the ability to train on internal polices and source code. The product was announced at the Google Cloud Summi [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 18 October 2021 )