Support Questions
Find answers, ask questions, and share your expertise

How to retain transformed data into memory if source data is deleted in pyspark?

How to retain transformed data into memory if source data is deleted in pyspark?

Step -1 : read data from cassandra
STPE-2 : read data from aws s3.
step -3 : cache() data from step -1 and step -2 .
STEP-4 : find difference between step 1 and step 2 and cache data and run count() on difference data.
STEP- 5: write difference data into aws s3 bucket .
STEP -6 : write step -1 data to step -2 aws s3 bucket . ( This is basically overwrite step)
STEP-7 : write difference data ( STEP-5) into database . ( this step is getting empty data though same data in STPE-4 is available in memory but after STEP- 6 same data is empty .)