hunks

Streaming with Kafka Connect

Apache Kafka is a high-throughput distributed message system that is being adopted by hundreds of companies to manage their real-time data.
Companies use Kafka for many applications (real time stream processing, data synchronisation, messaging, and more), but one of the most popular
applications is ETL pipelines. Kafka is a perfect tool for building data pipelines: it’s reliable, scalable, and efficient.

Until recently, building pipelines with Kafka has required significant effort: each system you wanted to connect to Kafka required either custom code or
a different tool, each new tool used a different set of configurations, might assume different data formats, and used different approaches to management
and monitoring. Data pipelines built from this hodgepodge of tools are brittle and difficult to manage.

Where ?

Four common use cases of Kafka

Source → Kafka = Producer API = Kafka Connect Source

Kafka → Kafka = Consumer Producer API = Kafka Streams API

Kafka → Sink = Consumer API = Kafka Connect Sink

Kafka → App -= Consumer App

Why Kafka Connect ?

Programmers wanted to import data from same sources into Kafka
Programmers wanted to store data via same sink from Kafka
Needed to achieve Exactly Once, Fault Tolerance, Distribution Ordering
Many Connectors available and customise the configuration accordingly
Part of ETL Pipeline
Scaling made easy from small pipelines to company wide pipelines
Configuration to be submitted via REST API

Everything as Events in the Streaming world

How does Kafka Connect help in ETL ?

What does Kafka Connector do ?

A Kafka Connector contains multiple loaded reusable loaded connectors. A connector can be a source or sink .
Breaks a job into Tasks
Supplies configuration to Tasks
Monitors and reconfigures Tasks when required.

What does Tasks do ?

It is responsible for copying data to/ from target systems.It is executed by Kafka Connect Workers which is a single Java process and can be run in Standalone or Cluster Mode.

It is executed by Kafka Connect Workers which is a single Java process and can be run in Standalone or Cluster Mode.

It is reconfigurable through API’s

Kafka Connect REST API

hunks

Monday, September 30, 2019

Streaming with Kafka Connect

No comments:

Check Mate

Blog Archive

HunK

Hit-the-Dot

		Time:
		Score: