These days you hear a lot about “stream processing”, “event data”, and “real-time”, often related to technologies like Kafka, Storm, Samza, or Spark’s Streaming module. Though there is a lot of excitement, not everyone knows how to fit these technologies into their technology stack or how to put it to use in practical applications.
This guide is going to discuss our experience with real-time data streams: how to build a home for real-time data within your company, and how to build applications that make use of that data. All of this is based on real experience: we spent the last five years building Apache Kafka, transitioning LinkedIn to a fully stream-based architecture, and helping a number of Silicon Valley tech companies do the same thing.
The first part of the guide will give a high-level overview of what we came to call a “stream data platform”: a…
View original post 4,180 more words