How to install and Configure Apache Kafka on Debian
Apache Kafka, commonly known as Kafka is a free and open-source distributed event streaming system. It is a powerful tool that performs as a message broker capable of handling a large volume of real-time data.
Therefore, if you are looking to implement a message broker or a pub/sub
system, Apache Kafka is a great tool with tons of features out of the box.
In this tutorial, we will show you how to get started with Kafka by installing and configuring the Kafka system on a Debian or any Debian based distribution.
NOTE: Although Kafka itself is heavily optimized for any use case. This tutorial is not production ready. Feel free to explore your own security measures when adopting Kafka.
Requirements
Before installing Kafka, you will need the following:
- A Debian system or any Debian based distribution.
- a root user or sudo permissions
- A valid JDK installation.
You can learn how to install the Amazon Corretto JDK in the resource below:
https://www.geekbits.io/how-to-install-amazon-corretto-jdk-on-ubuntu/
Once you have the above requirements met, we can proceed.
Download the Kafka Archives
Let us start by downloading and extracting the Kafka binaries. Open your terminal and navigate into the directory where you wish to store Kafka.
In our example, we will user the Downloads folder in the debian
user.
cd ~/Downloads
Next, use the the wget
command to download the kafka archive.
wget https://dlcdn.apache.org/kafka/3.2.0/kafka_2.12-3.2.0.tgz
The command will use wget
to download the archive and save it in the Downloads folder. You can check the resource below for the latest kafka binary.
https://kafka.apache.org/downloads
Next, extract the Kafka archive as:
tar -zxvf kafka_2.12-3.2.0.tgz
Replace the above command with name of the downloaded archive.
Once extracted, we need to move the kafka directory into a better location other than the Downloads folder. In our example, we will the /opt
directory.
Run the command:
sudo mv kafka_2.12-3.2.0 /opt/kafka
The command will move the extracted Kafka directory and save it into the /opt
directory.
We can verify this by running the command:
ls -la /opt/kafka
This should return the directory listing for the kafka directory as:
total 72
drwxr-xr-x 7 debian debian 4096 May 3 15:56 .
drwxr-xr-x 3 root root 4096 Jul 15 13:59 ..
drwxr-xr-x 3 debian debian 4096 May 3 15:56 bin
drwxr-xr-x 3 debian debian 4096 May 3 15:56 config
drwxr-xr-x 2 debian debian 4096 Jul 15 13:56 libs
-rw-r--r-- 1 debian debian 14640 May 3 15:52 LICENSE
drwxr-xr-x 2 debian debian 4096 May 3 15:56 licenses
-rw-r--r-- 1 debian debian 28184 May 3 15:52 NOTICE
drwxr-xr-x 2 debian debian 4096 May 3 15:56 site-docs
Configuring the Kafka Server
Once we have the Apache Kafka directories and binaries ready, we can proceed and configure the server to run on our system.
Enable Kafka Topic Delete
The first step is to allow Kafka to delete any topic you specify. This feature is disabled by default for security reasons. To enable Kafka topic delete, edit the server configuration file as:
sudo nano /opt/kafka/config/server.properties
Navigate to the bottom of the file and add the entry shown below:
delete.topic.enable = true
Save and close the file.
Create Systemd Unit Files and Start the Kafka Service
Although we can manage the Kafka server from the bin directory, it is best practice to create a service and manage it via systemd
.
We will start by creating a unit file for the Zookeeper and Kafka services. If you navigate into the kafka/bin
directory, you will see files as:
-rwxr-xr-x 1 debian debian 1376 May 3 15:52 kafka-server-start.sh
-rwxr-xr-x 1 debian debian 1361 May 3 15:52 kafka-server-stop.sh
-rwxr-xr-x 1 debian debian 1393 May 3 15:52 zookeeper-server-start.sh
-rwxr-xr-x 1 debian debian 1366 May 3 15:52 zookeeper-server-stop.sh
-rwxr-xr-x 1 debian debian 1019 May 3 15:52 zookeeper-shell.sh
Kafka uses these zookeeper
and kafka-server
files to start, restart, and stop the kafka services. Let us create a systemd units file and use these files to manage the server.
Run the command:
sudo touch /etc/systemd/zookeeper.service
The command above will create a unit file for the zookeeper service. You can rename this file to any filename you wish.
Next, edit the unit file:
sudo nano /etc/systemd/zookeeper.service
In the unit file, add the following entries.
[Unit]
Description=Apache Zookeeper Service
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
# replace the 'debian' user with the user you wish to run the kafka service
User=debian
# replace the value of JAVA_HOME with the location of the Java JDK
Environment=JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Let us break down what the above unit file is doing.
In the [Unit]
section, we define the metadata for the unit file and the relationship with other unit files.
In this case, we specify the description for the service and the link to the documentation resource.
Next, we specify the services we need before starting the zookeeper server. In our example, we need networking and filesystem to be ready before running.
Next comes the [Service]
section. This section allows us to specify the configuration required for that service.
Here, we define the path to the Java JDK which is required by Kafka. We also define the path to the scripts that systemd
should use to start and stop the service.
NOTE: We also specify the configuration files we wish to use when starting the service.
If the service should exit abnormally, we tell systemd
to start it as specified by the Restart
block.
Now, close the zookeeper.service
file and save the changes.
We are not quite done yet. The next step is to create a Kafka service file.
Run the command:
sudo touch /etc/systemd/kafka.service
This will create a unit file for the Kafka server.
Edit the file with your favorite text editor:
sudo nano /etc/systemd/kafka.service
In the above file, add the following entries:
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=network.target remote-fs.target
After=network.target remote-fs.target zookeeper.service
[Service]
Type=simple
# replace the user below
User=debian
# replace with the path to the Java JDK
Environment=JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
You will notice that the format is similar to the zookeeper unit file. However, in this case, we tell systemd
to use the kafka-server-start.sh
and kafka-server-stop.sh
files to start and stop the service.
Edit Socket Server Settings
The next step is to define the address on which the server will listen. This value is optional but required if you need to configure a custom port for the server.
Edit the server properties as:
sudo nano /opt/kafka/config/server.properties
Locate the entry below:
# listeners = PLAINTEXT://:9092
Uncomment the line above by removing the line #
sign. Next, edit the address you wish Kafka to use. Example is as shown:
listeners=PLAINTEXT://localhost:9092
In this case, we tell Kafka to listen on localhost on port 9092. Feel free to change this value as you see fit.
Save and close the file.
Start the Systemd Services and Reload the Daemon
Once you have everything in place, its to time to start and enable the servicies.
Start by enabling the zookeeper and kafka services.
sudo systemctl enable /etc/systemd/zookeeper.service
sudo systemctl enable /etc/systemd/kafka.service
Ensure to specify the full path to the unit files.
You should see an output as shown:
sudo systemctl enable /etc/systemd/zookeeper.service
Created symlink /etc/systemd/system/multi-user.target.wants/zookeeper.service → /etc/systemd/zookeeper.service.
Created symlink /etc/systemd/system/zookeeper.service → /etc/systemd/zookeeper.service.
sudo systemctl enable /etc/systemd/kafka.service
Created symlink /etc/systemd/system/multi-user.target.wants/kafka.service → /etc/systemd/kafka.service.
Created symlink /etc/systemd/system/kafka.service → /etc/systemd/kafka.service.
Once you have enabled the services, run the commands below to start the zookeeper and kafka services.
sudo systemctl start zookeeper.service
sudo systemctl start kafka.service
The commands above should start the Zookeeper and Kafka services. You can verify by running the commands:
sudo systemctl status zookeeper.service
The command above should return the status if the Zookeeper service is running. An example output is as shown:
● zookeeper.service - Apache Zookeeper Service
Loaded: loaded (/etc/systemd/zookeeper.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-07-15 07:12:17 CDT; 6s ago
Docs: http://zookeeper.apache.org
Main PID: 2741 (java)
Tasks: 28 (limit: 2284)
Memory: 111.3M
CPU: 1.465s
CGroup: /system.slice/zookeeper.service
└─2741 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:In>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,009] INFO zookeeper.commitLogCount=500 (org.apache.zoo>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,015] INFO zookeeper.snapshot.compression.method = CHEC>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,015] INFO Snapshotting: 0x0 to /tmp/zookeeper/version->
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,019] INFO Snapshot loaded in 10 ms, highest zxid is 0x>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,019] INFO Snapshotting: 0x0 to /tmp/zookeeper/version->
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,019] INFO Snapshot taken in 0 ms (org.apache.zookeeper>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,067] INFO zookeeper.request_throttler.shutdownTimeout >
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,069] INFO PrepRequestProcessor (sid:0) started, reconf>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,081] INFO Using checkIntervalMs=60000 maxPerMinute=100>
Jul 15 07:12:19 debian11 zookeeper-server-start.sh[2741]: [2022-07-15 07:12:19,082] INFO ZooKeeper audit is disabled. (org.apache.zoo>
lines 1-21/21 (END)
Form the output above, we can see that the Zookeeper service is running successfully.
To check the Kafka service, run:
sudo systemctl status kafka.service
Similarly, if the Kafka service is running, you should see an output as shown:
● kafka.service - Apache Kafka Server
Loaded: loaded (/etc/systemd/kafka.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-07-15 07:14:31 CDT; 5s ago
Docs: http://kafka.apache.org/documentation.html
Main PID: 3205 (java)
Tasks: 69 (limit: 2284)
Memory: 358.9M
CPU: 4.879s
CGroup: /system.slice/kafka.service
└─3205 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:f>
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,863] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,909] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Starting socket server acceptors and processors (kafka.network.SocketServer)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,948] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Started data-plane acceptor and processor(s) for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,949] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Started socket server acceptors and processors (kafka.network.SocketServer)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,959] INFO Kafka version: 3.2.0 (org.apache.kafka.common.utils.AppInfoParser)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,959] INFO Kafka commitId: 38103ffaa962ef50 (org.apache.kafka.common.utils.AppInfoParser)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,959] INFO Kafka startTimeMs: 1657887274949 (org.apache.kafka.common.utils.AppInfoParser)
Jul 15 07:14:34 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:34,966] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
Jul 15 07:14:35 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:35,106] INFO [BrokerToControllerChannelManager broker=0 name=forwarding]: Recorded new controller, from now on will use broker localhost:9092 (id: 0 rack: null) (kafka.server.Broke>Jul 15 07:14:35 debian11 kafka-server-start.sh[3205]: [2022-07-15 07:14:35,121] INFO [BrokerToControllerChannelManager broker=0 name=alterPartition]:
And that’s it. You have successfully installed and enabled the Apache Kafka Server on your Debian system.
Testing Kafka Producer/Consumer
Once you have Kafka Installed, its good to test if you can publish and consume topics.
Kafka Create Topic
Let’s start by creating a Kakfa topic using the kafka-topics
script. Run the command:
/opt/kafka/bin/kafka-topics.sh --create --topic GeekBits-Kafka --bootstrap-server localhost:9092
In the example above, we use the kafka topics creator to create a topic called GeekBits-Kafka
. We should see an output as shown:
Created topic GeekBits-Kafka.
We can check the detailed information about the created topic by running the command:
/opt/kafka/bin/kafka-topics.sh --describe --topic GeekBits-Kafka --bootstrap-server localhost:9092
The command above should return detailed information about the GeekBits-Kafka
topic as shown:
Topic: GeekBits-Kafka TopicId: dLq5D9brRyCebPXVWpXdKw PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: GeekBits-Kafka Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Kafka Write Events to Topic
Once we have a topic, let us see if we can communicate with the Kafka brokers by writing some events into the Kafka topic.
Run the command as show:
$ /opt/kafka/bin/kafka-console-producer.sh --topic GeekBits-Kafka --bootstrap-server localhost:9092
>Hello Geeks from Apache Kafka
>We hope you are enjoying this tutorial
>Thanks for tuning in.
>
>
>You can end the producer client by pressing CTRL+C
The command above will launch the Kafka producer client allowing you to write events to the specified topics. Once done writing your events, press CTRL + C
Kafka Read Events
Once we are done writing some events, we can read them using the consumer client. Open a new terminal session and run the command:
$ /opt/kafka/bin/kafka-console-consumer.sh --topic GeekBits-Kafka --from-beginning --bootstrap-server localhost:9092
The consumer client will start reading the events produced to the GeekBits-Kafka topic. We should start seeing the events we wrote earlier as shown:
Hello Geeks from Apache Kafka
We hope you are enjoying this tutorial
Thanks for tuning in.
You can end the producer client by pressing CTRL+C
You can test out the producer by writing more messages as shown earlier. Once done consuming the events, you can close the consumer client by pressing CTRL + C
Conclusion
And huge congratulations to you!! You have successfully finished the Apache Kafka beginners guide. In this article, you learned how to download and setup your Kafka server, creating and managing services, creating Kafka topics, writing events to a Kafka topic, and how to consume Kafka Topics.
We hope you enjoyed this tutorial. If you did, leave us a comment below and share the tutorial.
If you face any errors with Kafka installation and configuration, feel free to contact us and we will help you out!!!!!