Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Structured streaming : format("memory") is showing zero data.

avatar
Rising Star

I'm trying to read and store messages from a kafka topic using Spark Structured Streaming.

The records read are in df.

The below code shows zero records. If i replace the format with format("console"), i'm able to see the records being printed on console.

  StreamingQuery initDF = df.writeStream()
          .outputMode("append")
          .format("memory")
          .queryName("initDF")
          .trigger(Trigger.ProcessingTime(1000))
          .start();
sparkSession.sql("select * from initDF").show();

initDF.awaitTermination();
1 REPLY 1

avatar
New Contributor

@Ramya Jayathirtha

Okay,the way it works is :

In simple terms,think that The main Thread of your code launches another thread in which your streamingquery logic runs.

meanwhile ,your maincode is blocking due to

initDF.awaitTermination().

sparkSession.sql("select * from initDF").show() => This code run on the mainthread ,and it reaches there only for the first time.

So update your code to :

StreamingQuery initDF = df.writeStream() .outputMode("append") .format("memory") .queryName("initDF") .trigger(Trigger.ProcessingTime(1000)) .start();

while(initDF.isActive){

Thread.sleep(10000)

sparkSession.sql("select * from initDF").show()

}

Now the main thread of your code will be going through the loop over and over again and it queries the table.