Data Science

What is Apache Flafka? How to use it with Flume for data ingestion [Tutorial]

Apache Kafka is an open-source distributed stream-processing queuing platform, written in Scala and Java. Apache Kafka is used to publishing and subscribe messages in sequential order in the queue. Since Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system with higher throughput, reliability and replication characteristics.

In the Apache Kafka Distributed Platform, the Kafka cluster contains one or more servers (Kafka brokers). Producers are processes that publish data (i.e. push messages) into Kafka topics within the broker. A consumer is the one who subscribes data (i.e. pulls messages off from Kafka topic).

On the other hand, Apache Flume is an open source distributed, reliable, and available service for collecting and moving large amounts of data into different file system such as Hadoop Distributed File System (HDFS), HBase, etc. Flume acts as a centralized system service to ingest large volumes of data for streaming logs into several file systems such as HDFS for storage.

The “Flume Agent”, which is responsible for sending messages from the Source (i.e. the source path) to Sink (i.e. the destination path). The agent has the following components,

  • Source: Receives messages from Client or source path and transfers into Channel
  • Sink: It is used for Data Storage. They have different Sinks for Storing data such as HDFS Sink, Hbase Sink, etc
  • Channel: It acts as an intermediate buffer between Source and Sink for passing messages.

Integrating Flume with Kafka

Flume is a data ingestion tool that moves data from one place to another. In Kafka, the Flume is integrated for streaming a high volume of data logs from Source to Destination for Storing data in HDFS.

Deploying Flafka into Production

Using Flume with Kafka:

Kafka and Flume are separate tools. And integration of both is needed to stream the data in Kafka topic with high speed to different Sinks. Here the Flume acts as Consumer and stores in HDFS.

1. Start the Zookeeper server
bin/zkServer.sh start

2. Start the Kafka server
bin/kafka-server-start.sh config/server.properties

3. Here is the command for creating the topic in Kafka
./bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic kafkatest

4. Execute command for the producer in the Kafka topic
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic kafkatest

5. Download and install Apache Flume in your machine and start the Apache Flume in your local machine. For example – flume-conf.properties.

Use the Kafka source to stream data in Kafka topics to Hadoop. The Kafka source can be combined with any Flume sink, making it easy to write Kafka data to HDFS, HBase, etc.

The following is the Flume configuration:

a1.sources = r1
a1.sinks = sample
a1.channels = sample-channel

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = sample-channel
a1.sources.r1.topic = file
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.zookeeperConnect = localhost:2181
a1.sources.r1.spoolDir = /tmp/kafka-logs/
a1.sources.r1.basenameHeader=true

# Use a channel which buffers events in memory
a1.channels.sample-channel.type = memory

a1.channels.sample-channel.capacity = 1000
a1.channels.sample-channel.transactionCapacity = 1000
a1.channels.sample-channel.byteCapacityBufferPercentage = 20
a1.channels.sample-channel.byteCapacity = 131072000

# properties of sample-sink
a1.sinks.sample.channel = sample-channel
a1.sinks.sample.type = hdfs
flume1.sinks.sample.writeFormat = Text
#a1.sinks.sample.hdfs.path = hdfs://namenode/flumesource/source1
a1.sinks.sample.hdfs.path = hdfs://localhost:50000/tmp/kafka/%{topic}/%y-%m-%d
a1.sinks.sample.hdfs.useLocalTimeStamp = true
#a1.sinks.sample.hdfs.filePrefix=demo
#a1.sinks.sample.hdfs.fileSuffix=.txt
a1.sinks.sample.rollInterval=0
a1.sinks.sample.hdfs.deletePolicy=immediate
#a1.sinks.sample.hdfs.batchSize =1000
a1.sinks.sample.hdfs.rollSize=131072000
a1.sinks.sample.hdfs.rollCount=0
a1.sinks.sample.hdfs.idleTimeout=0
a1.sinks.sample.hdfs.maxOpenFiles = 10000

6. Start Flume to copy data to store in HDFS Sink
bin/flume-ng agent –conf conf –conf-file conf/flume-conf.properties -Dflume.root.logger=DEBUG,console –name a1 -Xmx512m -Xms256m

What are the best practices for Flafka?

As a producer

Use Flume Source to write to Kafka topic.

Here is the configuration file for the Flume with Kafka in order to act as Producer:

a1.sources = r1
a1.sinks = sample
a1.channels = sample-channel
a1.sources.r1.type = exec
a1.sources.r1.command =cat /home/indium/dek.csv
a1.sources.r1.logStdErr = true
a1.channels.sample-channel.type = memory
a1.channels.sample-channel.capacity = 1000
a1.channels.sample-channel.transactionCapacity = 100
a1.sources.r1.channels.selector.type = replicating
a1.sources.r1.channels = sample-channel
a1.sinks.sample.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sample.topic = sample_topic
a1.sinks.sample.brokerList = localhost:9092
a1.sinks.sample.requiredAcks = 1
a1.sinks.sample.batchSize = 20
a1.sinks.sample.channel = sample-channel

As a consumer

Write to Flume Sink from Kafka topic.

We have already seen the configuration for Flume. We have also seen above how to write on the HDFS Sink. Here is the diagram for both Producer and Consumer. And how to integrate Kafka with Flume to publish data to Kafka topic as well as write data to HDFS Storage.

In conclusion

As a best practice to integrate Kafka with Flume for Streaming heavy velocity data, Flafka provides more flexibility for the data pipeline and can achieve distributed ingestion pipeline that, with careful tuning, can ingest more than 1 million events per second.

36 Comments
  1. Raymondwob 4 months ago
    Reply

    Best cryptocurrency to Invest 2019: http://valeriemace.co.uk/cryptoinvestbitcoin39450

  2. Raymondwob 4 months ago
    Reply

    Best cryptocurrency to Invest 2019: http://yourls.site/bestinvestcryptobitcoin76111

  3. VincentWhict 4 months ago
    Reply

    Forex trader makes $10,000 in minutes: http://rih.co/milliondollarsforex19039

  4. Raymondwob 4 months ago
    Reply

    Top cryptocurrencies to invest in 2019: http://www.vkvi.net/15000investbinarycrypto13412

  5. Raymondwob 4 months ago
    Reply

    Wie mache ich € 3000 pro Tag?: http://jnl.io/bestinvestcrepto17950

  6. JamesOvalp 4 months ago
    Reply

    Comment gagner 10 000 € par jour FAST: http://www.abcagency.se/bestinvestcrepto34597

  7. Raymondwob 3 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://www.lookweb.it/INVESTINBITCOIN65370

  8. Marvinpop 3 months ago
    Reply
  9. Your way of explaining everything in this article is genuinely good,
    every one can simply know it, Thanks a lot.

  10. download minecraft 2 months ago
    Reply

    There is certainly a lot to know about this issue.

    I really like all the points you have made.

  11. I’d like to find out more? I’d like to find out some additional information.

  12. Marvinpop 2 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://corta.co/investminingcrypto71573

  13. JamesOvalp 2 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: https://aaa.moda/investmining63901

  14. I’m really loving the theme/design of your weblog. Do you ever run into any web browser
    compatibility issues? A small number of my blog visitors have complained about my
    blog not working correctly in Explorer but looks great in Opera.

    Do you have any tips to help fix this problem?

  15. Marvinpop 1 month ago
    Reply

    Wenn Sie im Jahr 2011 1.000 USD in Bitcoin investiert haben, haben Sie jetzt 4 Millionen USD: https://clck.ru/GR4Qk

  16. JamesOvalp 1 month ago
    Reply

    10000 US-Dollar pro Tag Neuer Handelsservice für binäre Optionen akzeptiert nur Bitcoin-Zahlungen: http://tinyurl.com/y5d5bb6m

  17. JamesOvalp 1 month ago
    Reply

    Bitcoin Investment Deutschland: http://xurl.es/d97f6

  18. JamesCix 1 month ago
    Reply

    So investieren Sie 10 000 USD in Cryptocurrency – Money Morning Germany: http://eb.by/ZquF

  19. JamesCix 1 month ago
    Reply

    So investieren Sie 10 000 USD in Cryptocurrency – Money Morning Germany: http://eb.by/ZquF

  20. JamesCix 1 month ago
    Reply

    So investieren Sie 10 000 USD in Cryptocurrency – Money Morning Germany: http://eb.by/ZquF

  21. DylanLes 1 month ago
    Reply

    Find yourself a girl for the night in your city Canada: http://xurl.es/akbtd

  22. EduardoAvabs 4 weeks ago
    Reply

    Trouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/nb6jw

  23. EduardoAvabs 4 weeks ago
    Reply

    Trouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/nb6jw

  24. WilliamMeave 4 weeks ago
    Reply

    Such dir ein Madchen fur die Nacht in deiner Stadt: http://xurl.es/q9ibl

  25. WilliamMeave 4 weeks ago
    Reply

    Wie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://v.ht/mauQYC

  26. WilliamMeave 4 weeks ago
    Reply

    Wie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://v.ht/mauQYC

  27. DylanLes 3 weeks ago
    Reply

    Investiere Cannabis in Kalifornien: https://hec.su/hTAz

  28. LowellBiC 3 weeks ago
    Reply

    Invest cannabis Australia: https://hec.su/h3k9

  29. GeorgeKak 2 weeks ago
    Reply

    Возьмите свои 110879 рублей: http://inx.lv/J7tY?&pdmto=jHCOhMB9fA299j

  30. Louismup 2 weeks ago
    Reply

    Получите свои 140584 честно заработанных рублей: https://tiny.pl/tdmpv?&jvjld=6sGZe

  31. Louismup 2 weeks ago
    Reply

    Получите свои 140584 честно заработанных рублей: https://tiny.pl/tdmpv?&jvjld=6sGZe

  32. Aarongag 2 weeks ago
    Reply

    Получите Ваши 132988 честно заработанных рублей: http://v.ht/8Kv14?sSvPIG2

  33. DylanLes 2 weeks ago
    Reply

    Where to invest $ 3000 once and receive every month from $ 55000: https://hideuri.com/xRyQdw?&dmvmt=e8HxDgYyNqq

  34. JamesCix 1 week ago
    Reply

    Binary options + Bitcoin = $ 5000 per week: https://hideuri.com/KAPR73?xc98YxYB7

  35. EduardoAvabs 1 week ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: https://s.coop/22zzg?Y7IfZ2E

  36. DylanLes 1 week ago
    Reply

    Binary options + Bitcoin = $ 5000 per week: http://cort.as/-KwYC?&dyeun=tHcyj9hTCP6GKG

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest