LinkedIn Open Sources Feathr Machine Learning Feature Store |
Written by Kay Ewbank | |||
Friday, 22 April 2022 | |||
LinkedIn has made Feathr open source. Feathr is the feature store LinkedIn built to simplify machine learning feature management and improve developer productivity. The developers say that at LinkedIn dozens of applications use Feathr to define features, compute them for training, deploy them in production, and share them across teams. Feathr was developed to mitigate a problem faced by LinkedIn, that of preparing and managing features based on raw data sources for use by machine learning models. LinkedIn has hundreds of ML models running in applications like Search, Feed, and Ads, and those models are powered by thousands of features about entities. Preparing and managing features for use by those ML applications is difficult and takes time. Feature preparation pipelines are made up of the systems and workflows that transform raw data into features for model training and inference. The pipelines are used to bring together time-sensitive data - potentially from multiple sources. Those 'features' are then joined to training labels, stored, and used by the ML applications. Feathr provides a way to make feature preparation pipeline creation easier. It is an abstraction layer that provides a common feature namespace for defining features and a common platform for computing, serving, and accessing them “by name” from within ML workflows. Feathr can be used to define features based on raw data sources, including time-series data, using simple APIs. Once the features have been defined, Feathr can be used to access those features by their names during model training and model inferencing. Features can also be shared across teams. Feathr automatically computes feature values and joins them to training data, using point-in-time-correct semantics to avoid data leakage. It also supports deploying features for use online in production. Feathr’s abstraction creates producer and consumer personas for features. Producers define features and register them into Feathr, and consumers access/import groups of features into their ML model workflows. For the consumer, Feathr acts like a software package management tool for ML features. Feathr lets feature-consumers list the names of the features they want to “import” in their model, abstracting the nontrivial details about how they are sourced and computed. Feathr is available on GitHub now. More InformationRelated ArticlesLinkedIn Open Sources Data Streaming Tool LinkedIn Restricts Developer Access LinkedIn Developer Network Opens To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Friday, 22 April 2022 ) |