Microsoft's Data Science for Beginners

Written by Nikos Vaggalis

Monday, 18 October 2021

There's a new free, self-paced online course about Data Science from Microsoft's Azure Cloud Advocates. Its 20 lesson curriculum, expected to take 10 weeks to complete, is targeted at those new to Data Science. Of course, it uses Python.

The term Data Science is used to define a vast array of topics. Its definition has become even more broad by overlapping with the sibling fields of applied mathematics, statistics, machine learning, AI etc.

If asked about the difference between ML and Data Science, I would described ML as being a technology based on data that trains models based on that data in order to tune an algorithm. Data Science is broader than that; it involves collecting, cleaning and aggregating data, visualizing it and using it statistically in order to reach data-driven decisions. However, as we'll discover in this course both fields converge at certain points like data processing or training predictive models.

The definition that this course provides for Data Science is that it encompasses all the following processes:

1. Data Acquisition
The first step is to collect the data. While in many cases it can be a straightforward process, like data coming to a database from web application, sometimes we need to use special techniques.

2. Data Storage
Storing the data can be challenging, especially if we are talking about big data. There are several ways data can be stored:

Relational database stores a collection of tables, and uses a special language called SQL to query them.
NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs.
Data Lake storage is used for large collections of data in raw form.

3. Data Processing
Processing the data from its original form to the form that can be used for visualization/model training. When dealing with unstructured data such as text or images, we may need to use some AI techniques to extract features from the data, thus converting it to structured form.

4. Visualization / Human Insights
Often, data scientist needs to "play with data", visualizing it many times and looking for some relationships. Also, we may use techniques from statistics to test some hypotheses or prove correlation between different pieces of data.

5. Training predictive model
Because the ultimate goal of data science is to be able to take decisions based on data, we may want to use the techniques of Machine Learning to build predictive model that will be able to solve our problem.

The course therefore looks at each of those processes in detail. This includes:

Statistics and Probability Theory
Mean, Variance and Standard Deviation Mode, Median and Quartiles
Working with Data
Relational Databases and their properties of relationships; Retrieving data, Joining data
Working with Non-Relational Data
Spreadsheets, NoSQL, JSON, Document Data Stores with the Azure Cosmos DB
Working with Tabular Data and Dataframes
Python and the Pandas Library practicing on the real world examples of Analyzing COVID Spread modelling and Analyzing COVID scientific papers
Data Preparation and Visualizing with Matplotlib
Cleaning data, Visualizing Quantities, Visualizing Proportions, Visualizing Distributions
Data Science in the Cloud with Azure
Training models using Low Code tools, Deploying models with Azure Machine Learning Studio
Data Science Ethics

The syllabus in detail:

Defining Data Science
Data Science Ethics
Defining Data
Introduction to Statistics & Probability
Working with Relational Data
Working with NoSQL Data
Working with Python
Data Preparation
Visualizing Quantities
Visualizing Distributions of Data
Visualizing Proportions
Visualizing Relationships
Meaningful Visualizations
Introduction to the Data Science lifecycle
Analyzing
Communication
Data Science in the Cloud
Data Science in the Wild

Resources wise, it's pretty much a complete class that it includes nice sketches, supplemental videos quizzes, step-by-step guides on how to build the projects, knowledge checks, challenges and assignments which should be enough to get your journey started.

By way of prerequisites, it's recommended to have a basic understanding of Python, Visual Studio Code and be able to run code in Jupyter Notebooks.

After going through it, you'll be looking for the next steps. A good option on a familiar path is to go with another Microsoft course, "Machine Learning for Beginners" which follows the same structure.

More Information

Data Science For Beginners

Microsoft's Machine Learning for Beginners

Fly Over the Moon With Microsoft And Python

Ethics of AI - A Course From Finland

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Parasoft Adds AI Assistant To C/C++ Test
30/06/2025

Parasoft has updated its C/C++ Test software with an AI-powered documentation assistant, along with complete support for MISRA C:2025 and auto-suppression of equivalent violations. C/C++ Test can be u [ ... ]

+ Full Story

Two Tools To Elevate Your MongoDB Experience
03/07/2025

The tools contradict each other; the first one allows you to write SQL instead of using Mongo's special syntax, while the other allows you to manipulate the database without having to write SQL a [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 18 October 2021 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments