Member since
10-08-2017
3
Posts
0
Kudos Received
0
Solutions
08-14-2022
05:05 PM
My OS is windows 11 and Apache Spark version is spark-3.1.3-bin-hadoop3.2 I try to use spark structured streaming with pyspark. Belows are my simple spark structured streaming codes. spark = SparkSession.builder.master("local[*]").appName(appName).getOrCreate()
spark.sparkContext.setCheckpointDir("/C:/tmp") The same spark codes without spark.sparkContext.setCheckpointDir line throws no errors on Ubuntu 22.04. However the above codes do not work successfully on windows 11. The execeptions are pyspark.sql.utils.IllegalArgumentException: Pathname /C:/tmp/67b1f386-1e71-4407-9713-fa749059191f from C:/tmp/67b1f386-1e71-4407-9713-fa749059191f is not a valid DFS filename. I think the error codes mean checkpoint directory are generated on hadoop file system of linux os , not on windows 11. My operating system is windows and checkpoint directory shoud be windows 11 local directory. How can I configure apache spark checkpoint with windows 11 local directory? I used file:///C:/temp and hdfs://C:/temp URL for test. But the errors are still thrown. Any reply will be thanksful. Best regards
... View more
Labels:
- Labels:
-
Apache Spark