- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark Structured streaming : format("memory") is showing zero data.
- Labels:
-
Apache Spark
Created ‎03-29-2018 03:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to read and store messages from a kafka topic using Spark Structured Streaming.
The records read are in df.
The below code shows zero records. If i replace the format with format("console"), i'm able to see the records being printed on console.
StreamingQuery initDF = df.writeStream() .outputMode("append") .format("memory") .queryName("initDF") .trigger(Trigger.ProcessingTime(1000)) .start(); sparkSession.sql("select * from initDF").show(); initDF.awaitTermination();
Created ‎04-15-2018 07:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay,the way it works is :
In simple terms,think that The main Thread of your code launches another thread in which your streamingquery logic runs.
meanwhile ,your maincode is blocking due to
initDF.awaitTermination().
sparkSession.sql("select * from initDF").show() => This code run on the mainthread ,and it reaches there only for the first time.
So update your code to :
StreamingQuery initDF = df.writeStream() .outputMode("append") .format("memory") .queryName("initDF") .trigger(Trigger.ProcessingTime(1000)) .start();
while(initDF.isActive){
Thread.sleep(10000)
sparkSession.sql("select * from initDF").show()
}
Now the main thread of your code will be going through the loop over and over again and it queries the table.
