Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark not showing Kafka Data Properly

avatar
New Contributor

I'm trying to use kafka data using pyspark but I having difficult because it's in Hashmap type

 

The Question is, how can I convert this to a useful df to be treated in pyspark?

 

This is the output and my actual code:

 

This is the outputThis is the outputThis is my actual codeThis is my actual code

 

Any suggestion and steps?

 

1 ACCEPTED SOLUTION

avatar
Super Guru

Looking at the serialized data, that seems like the Java binary serialization protocol. It seems to me that the producer is simply writing the HashMap java object directly to Kafka, rather than using a proper serializer (Avro, JSON, String, etc.)

 

You should look into modifying your producer so that you can properly deserialize the data that you're reading from Kafka.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

3 REPLIES 3

avatar
Super Guru

You need to find out what's the serializer that's being used to write data to Kafka and use an associated deserializer to read those messages.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Super Collaborator

Hi @victorescosta 

 

You need to check the producer code at which format kafka message is produced and what kind of Serializer class you have used. Same format/serialiser you need to use while deserialising the data. For example while writing data if you have used Avro then while deserialising you need to Avro.

 

@araujo You are right. Customer needs to check their producer code and serializer class.

avatar
Super Guru

Looking at the serialized data, that seems like the Java binary serialization protocol. It seems to me that the producer is simply writing the HashMap java object directly to Kafka, rather than using a proper serializer (Avro, JSON, String, etc.)

 

You should look into modifying your producer so that you can properly deserialize the data that you're reading from Kafka.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.