Support Questions

Find answers, ask questions, and share your expertise

how to set spark structured streaming checkpoint directory to windows local directory?

avatar
New Contributor

My OS is windows 11 and Apache Spark version is spark-3.1.3-bin-hadoop3.2
I try to use spark structured streaming with pyspark. Belows are my simple spark structured streaming codes.

 

 

spark = SparkSession.builder.master("local[*]").appName(appName).getOrCreate()
spark.sparkContext.setCheckpointDir("/C:/tmp")

 

 

The same spark codes without spark.sparkContext.setCheckpointDir line throws no errors on Ubuntu 22.04. However the above codes do not work successfully on windows 11. The execeptions are

 

 

pyspark.sql.utils.IllegalArgumentException: Pathname /C:/tmp/67b1f386-1e71-4407-9713-fa749059191f from C:/tmp/67b1f386-1e71-4407-9713-fa749059191f is not a valid DFS filename.

 

 

I think the error codes mean checkpoint directory are generated on hadoop file system of linux os , not on windows 11. My operating system is windows and checkpoint directory shoud be windows 11 local directory. How can I configure apache spark checkpoint with windows 11 local directory? I used file:///C:/temp and hdfs://C:/temp URL for test. But the errors are still thrown.

Any reply will be thanksful. Best regards

0 REPLIES 0