Support Questions

victorescosta · ‎01-31-2022

I'm trying to use kafka data using pyspark but I having difficult because it's in Hashmap type

The Question is, how can I convert this to a useful df to be treated in pyspark?

This is the output and my actual code:

This is the outputThis is my actual code

Any suggestion and steps?

araujo · ‎02-08-2022

Looking at the serialized data, that seems like the Java binary serialization protocol. It seems to me that the producer is simply writing the HashMap java object directly to Kafka, rather than using a proper serializer (Avro, JSON, String, etc.)

You should look into modifying your producer so that you can properly deserialize the data that you're reading from Kafka.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

araujo · ‎02-06-2022

You need to find out what's the serializer that's being used to write data to Kafka and use an associated deserializer to read those messages.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

RangaReddy · ‎02-08-2022

Hi @victorescosta

You need to check the producer code at which format kafka message is produced and what kind of Serializer class you have used. Same format/serialiser you need to use while deserialising the data. For example while writing data if you have used Avro then while deserialising you need to Avro.

@araujo You are right. Customer needs to check their producer code and serializer class.

araujo · ‎02-08-2022

Looking at the serialized data, that seems like the Java binary serialization protocol. It seems to me that the producer is simply writing the HashMap java object directly to Kafka, rather than using a proper serializer (Avro, JSON, String, etc.)

You should look into modifying your producer so that you can properly deserialize the data that you're reading from Kafka.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Spark not showing Kafka Data Properly