Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Refresh Dataframe in Spark real-time Streaming without stopping process

Highlighted

Refresh Dataframe in Spark real-time Streaming without stopping process

Explorer

n my application i get a stream of accounts from Kafka queue (using Spark streaming with kafka) And i need to fetch attributes related to these accounts from S3 so im planning to cache S3 resultant dataframe as the S3 data will not updated atleast for a day for now, it might change to 1hr or 10 mins very soon in future .So the question is how can i refresh the cached dataframe periodically without stopping process.