Microsoft Releases Open Source Distributed Machine Learning Library
Written by Kay Ewbank   
Tuesday, 04 January 2022

Microsoft has released an open-source library for creating massively scalable machine learning (ML) pipelines. SynapseML was until now known as MMLSpark, and it unifies several existing ML frameworks and new Microsoft algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

SynapseML builds on Apache Spark and SparkML, adding deep learning and data science tools to the Spark ecosystem.

synapseml

It integrates Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit, and OpenCV, so through these tools provides highly-scalable predictive and analytical models for a variety of data sources. SynapseML also includes the HTTP on Spark project, meaning users can embed web services into their SparkML models.

synapseml2

SynapseML simplifies this experience by unifying many different ML learning frameworks with a single API that is scalable, data- and language-agnostic, and that works for batch, streaming, and serving applications. It’s designed to help developers focus on the high-level structure of their data and tasks, not the implementation details and idiosyncrasies of different ML ecosystems and databases.

The unified API provides a standard way to use the tools, so developers can make use of multiple ML frameworks where necessary. It can also train and evaluate models on single-node, multi-node, and elastically resizable clusters of computers, making it simpler to scale up as required.

Describing the new software, Mark Hamilton, a Microsoft software engineer, said that many tools in SynapseML don’t require a large labelled training dataset. Instead, SynapseML provides simple APIs for pre-built intelligent services, such as Azure Cognitive Services, to quickly solve large-scale AI challenges related to both business and research.

SynapseML lets developers embed 45 different ML services directly into their systems and databases. The latest release includes added support for distributed form recognition, conversation transcription, and translation. These ready-to-use algorithms can parse a wide variety of documents, transcribe multi-speaker conversations in real time, and translate text to over 100 different languages.

SynapseML also extends Spark's Structured Streaming engine, meaning that jobs that run on the Structured Streaming engine can be used via a web service.

SynapseML is available now.

synapseml

More Information

SynapseML On GitHub

SynapseML Website

Related Articles

Microsoft Open Sources Natural Language Processing Tool

More AI Tools From Microsoft

Apache Ignite Adds Spark DataFrames Support

.NET For Apache Spark Updated

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


The Feds Want Us To Move On From C/C++
13/11/2024

The clamour for safe programming languages seems to be growing and becoming official. We have known for a while that C and C++ are dangerous languages so why has it become such an issue now and is it  [ ... ]



C23 ISO Standard Is Here But You Probably Won't Read It
06/11/2024

At last ISO C23 has been published, but at $250 you probably aren't going to read it. Can we really tolerate this sort of profiteering on the work of others? This is worse than academic publishing!


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info