Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Looking for pyspark kafka example using direct approach(no receivers) with kafka offset handling

Highlighted

Looking for pyspark kafka example using direct approach(no receivers) with kafka offset handling

Rising Star

store the kafka offset in ambari postgres or hive mysql database and start consuming from the stored offset in the next microbatch

3 REPLIES 3
Highlighted

Re: Looking for pyspark kafka example using direct approach(no receivers) with kafka offset handling

Mentor

That's going to create a lot of overhead and produce a lot of reads and writes to the database. You might want to store the state in some memory cache or nosql store or maybe even zookeeper.

Re: Looking for pyspark kafka example using direct approach(no receivers) with kafka offset handling

Rising Star

Ya I can use phoenix and store the offset in nosql. I am actually looking for an example code in pyspark.

Highlighted

Re: Looking for pyspark kafka example using direct approach(no receivers) with kafka offset handling

Mentor
Don't have an account?
Coming from Hortonworks? Activate your account here