I am trying to read data from kafka and save to parquet file on hdfs. My code is similar to following, that the difference is I am writing in Java.
val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribe", "topic1") .load() df.selectExpr("CAST(key AS STRING)","CAST(value AS STRING)").writeStream.format("parquet").option("path",outputPath).option("checkpointLocation", "/tmp/sparkcheckpoint1/").outputMode("append").start().awaiteTermination()
However it threw "Uri without authority: hdfs:/data/_spark_metadata" exception, where "hdfs:///data" is the output path.
When I change the code to spark.read and df.write to write out parquet file once, there is no any exception, so I guess it is not related to my hdfs config.
Can anyone help me?
simply removing hdfs:// in path option worked. keep the hdfs:// in checkpoint though