LinkedIn Open Sources Data Streaming Tool

Written by Kay Ewbank

Friday, 26 July 2019

LinkedIn has open-sourced its tool for streaming data between systems. Brooklin is described as a distributed service for streaming data in near real-time and at scale.

The tool has been running in production at LinkedIn since 2016, and handles thousands of data streams and over 2 trillion messages per day. It exposes a set of abstractions that mean it can be extended to support consuming and producing data to and from new systems by writing new Brooklin consumers and producers.

LinkedIn uses Brooklin as the primary solution for streaming data across various stores including Espresso and Oracle, and messaging systems including Kafka, Azure Event Hubs, and AWS Kinesis.

Brooklin has been designed for use in multi-tenancy systems, and can simultaneously power hundreds of data pipelines across different systems. Creating new data pipelines or datastreams and modifying existing ones can be accomplished with just an HTTP call to a REST endpoint. Brooklin also exposes a diagnostics REST endpoint that you can use for on-demand querying of a data stream’s status. Source and destination systems don't have to be the same, and can be freely mixed and matched. Data streams are processed concurrently and independently meaning that errors in one stream are isolated from the rest.

brooklin2

The developers of Brooklin say that because it's a dedicated service for streaming data across various environments, all of the complexities can be managed within a single service, so application developers can focus on processing the data and not on data movement. The centralized extensible framework also means organizations can enforce policies. For example, Brooklin can be configured to enforce company-wide policies, such as requiring that any data flowing in must be in JSON format, or any data flowing out must be encrypted.

Writing about the open source release, LinkedIn Engineering Manager Celia Kung said that Brooklin is used at LinkedIn as an alternative to Kafka MirrorMaker for mirroring Kafka data from one Kafka cluster to another:

"Since Brooklin was designed as a generic bridge for streaming data, we were able to easily add support for moving enormous amounts of Kafka data."

"One of the largest use cases for Brooklin as a streaming bridge at LinkedIn is to mirror Kafka data between clusters and across data centers. Kafka is used heavily at LinkedIn to store all types of data, such as logging, tracking, metrics, and much more. We use Brooklin to aggregate this data across our data centers to make it easy to access in a centralized place. We also use Brooklin to move large amounts of Kafka data between LinkedIn and Azure."

The open source release is available on GitHub.

brooklin

More Information

Brooklin On GitHub

Kafka 2 Adds Support For ACLs

Kafka Graphs Framework Extends Kafka Streams

Apache Kafka Adds New Streams API

LinkedIn Restricts Developer Access

LinkedIn Groups API

LinkedIn Developer Network Opens

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

GitHub Launches Secret Protection
10/04/2025

GitHub has announced measures making it easier to protect yourself from exposed secrets. They include a standalone version of Secret Protection and organization-wide scanning.

+ Full Story

Torsten Hoefler Awarded ACM Prize In Computing
04/04/2025

Torsten Hoefler, a professor at ETH Zurich and Chief Architect for AI and Machine Learning at the Swiss National Supercomputing Centre, has been awarded the 2024 ACM Prize in [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 26 July 2019 )

More Information

Related Articles

Comments