Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Uri without authority: hdfs:/data/_spark_metadata error when use spark streaming write parquet file on hdfs

Highlighted

Uri without authority: hdfs:/data/_spark_metadata error when use spark streaming write parquet file on hdfs

Rising Star

I am trying to read data from kafka and save to parquet file on hdfs. My code is similar to following, that the difference is I am writing in Java.

val df = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
      .option("subscribe", "topic1")
      .load()

df.selectExpr("CAST(key AS STRING)","CAST(value AS STRING)").writeStream.format("parquet").option("path",outputPath).option("checkpointLocation", "/tmp/sparkcheckpoint1/").outputMode("append").start().awaiteTermination()

However it threw "Uri without authority: hdfs:/data/_spark_metadata" exception, where "hdfs:///data" is the output path.

When I change the code to spark.read and df.write to write out parquet file once, there is no any exception, so I guess it is not related to my hdfs config.

Can anyone help me?

1 REPLY 1

Re: Uri without authority: hdfs:/data/_spark_metadata error when use spark streaming write parquet file on hdfs

New Contributor

simply removing hdfs:// in path option worked. keep the hdfs:// in checkpoint though