store the kafka offset in ambari postgres or hive mysql database and start consuming from the stored offset in the next microbatch
That's going to create a lot of overhead and produce a lot of reads and writes to the database. You might want to store the state in some memory cache or nosql store or maybe even zookeeper.
Ya I can use phoenix and store the offset in nosql. I am actually looking for an example code in pyspark.