I am trying to read data from kafka and save to parquet file on hdfs. My code is similar to following, that the difference is I am writing in Java.
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
df.selectExpr("CAST(key AS STRING)","CAST(value AS STRING)").writeStream.format("parquet").option("path",outputPath).option("checkpointLocation", "/tmp/sparkcheckpoint1/").outputMode("append").start().awaiteTermination()
However it threw "Uri without authority: hdfs:/data/_spark_metadata" exception, where "hdfs:///data" is the output path.
When I change the code to spark.read and df.write to write out parquet file once, there is no any exception, so I guess it is not related to my hdfs config.
Can anyone help me?