Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Structured Spark Streaming (consumer side)

Structured Spark Streaming (consumer side)

New Contributor

I have wrote a Producer application which writes data to kafka topic (converting from strings to bytes).

Now i have written structured spark based consumer application which i am trying to read from that topic. I have setup the schema also. But my i get all fields as NULL values as shown below..

+-------------+--------------------+--------+---------+---------+--------------+----+---------+----------------+
| |SENT_TIME|LAST_OCCURANCE|NODE|NODE_TYPE|X733SPECIFICPROB|
+-------------+--------------------+--------+---------+---------+--------------+----+---------+----------------+
| null | null | null | null |null | null | null |
| null | null| null| null|null| null| null|
|null| null| null| null|null| null| null|
| null| null| null| null|null| null| null|

| null| null| null| null|null| null| null|
| null| null| null| null|null| null| null|
|null| null| null| null|null| null| null|
| null| null| null| null|null| null| null|


+-------------+--------------------+--------

May i know where i might have done the mistake?

2 REPLIES 2

Re: Structured Spark Streaming (consumer side)

New Contributor

Which Spark library are you using? What happens if you don't map the schema and just infer it on read? I've run into this issue when I've had null values in my raw data and when I've mapped the schema to a JSON payload. If you have null values in aggregate columns you can replace them with df.fillna("0") then you can create another DF from that transformed object. I've also run into issues with specifying the schema during the creation of a DF, so I've mapped the schema afterwords with .withColumn("name", df["name"].cast(FloatType())). This seems to be the case when it's difficult to parse unicode to a particular dataType with schema less files like JSON.

Re: Structured Spark Streaming (consumer side)

New Contributor

Try dumping your Kafka topic as well to see if the producer is actually emitting data.