Created on 01-07-202101:20 PM - edited on 01-10-202107:42 PM by subratadas
In this article, we will walk through the steps required to connect a Spark Structured Streaming application to Kafka in CDP Data Hub. We use two Data Hubs, one with a Data Engineering Template, and another with a Streams Messaging template. Both Data Hubs were created in the same environment.
1. Obtain the FreeIPA certificate of your environment:
From the CDP Home Page, navigate to Management Console > Environments
Locate and select your environment from the list of available environments
Click Actions
Select Get FreeIPA Certificate from the drop-down menu. The FreeIPA certificate downloads.
2. Add the FreeIPA certificate to the truststore of the client.
The certificate needs to be added for all clients that you want to connect to the Data Hub provisioned cluster. The exact steps of adding the certificate to the truststore depends on the platform and key management software used. For example, you can use the Java keytool command line tool:
A valid workload username and password has to be provided to the client, otherwise it cannot connect to the cluster. Credentials can be obtained from Management Console.
From the CDP Home Page, navigate to Management Console > User Management
Locate and select the user account you want to use from the list of available accounts. (The user details page displays information about the user.)
Find the username found in the Workload Username entry and note it down
Find the Workload Password entry and click Set Workload Password
In the dialog box that appears, enter a new workload password, confirm the password and note it down
Fill out the Environment text box
Click Set Workload Password and wait for the process to finish
Note, in the above code, we have specified our keystore location in our option (kafka.ssl.truststore.location), and our keystore password in the kafka.ssl.truststore.password option. The password we provide here is the password that we provided for our keystore at the time of its creation.
Note: We have specified our workload username and password in the "kafka.sasl.jaas.config" option.
5. Kinit as a user with permissions to the Kafka topic
From the CDP Home Page, navigate to Data Hub Clusters > (Drill down to the Data Engineering Data Hub) > Resource Manager > Applications > (Drill down to the stdout logs for your Job)