Loading...

Real-Time Data Processing

Transforming Data in Real-Time

Real-Time Data Processing
Paid Course

Real-Time Data Processing

Learn how to perform data transformations on real-time event-driven data in Python by integrating distributed data pipelines with scalable, high-throughput and fault-tolerant streaming platforms.

Course Details

This course provides a hands-on exploration of the industry-standard Apache Kafka distributed streaming platform and how it can be integrated with distributed data pipelines via Apache Spark and its Structured Streaming engine in order to build high-throughput and low-latency real-time data processing systems. This course follows on from our Distributed Data Engineering course, and enables experienced senior data engineers to build systems capable of transforming, and deriving actionable insight from, data in real-time, including performing real-time SQL operations, joins, deduplication and handling data with earlier timestamps but which arrive after data with later timestamps.

Requirements

Outcomes

  • The ability to apply data transformation techniques to event-driven data in real-time.
  • The ability to integrate distributed data pipelines with distributed streaming platforms in order to process and derive actionable insights from data in real-time, including performing real-time SQL operations, joins, deduplication and handling data with earlier timestamps but which arrive after data with later timestamps.
  • Knowledge of the industry-standard Apache Spark Structured Streaming engine and Apache Kafka distributed streaming platform.

Contact UsLog In