Support Questions

darouwan · ‎03-08-2018

I am trying to read data from kafka and save to parquet file on hdfs. My code is similar to following, that the difference is I am writing in Java.

val df = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
      .option("subscribe", "topic1")
      .load()

df.selectExpr("CAST(key AS STRING)","CAST(value AS STRING)").writeStream.format("parquet").option("path",outputPath).option("checkpointLocation", "/tmp/sparkcheckpoint1/").outputMode("append").start().awaiteTermination()

However it threw "Uri without authority: hdfs:/data/_spark_metadata" exception, where "hdfs:///data" is the output path.

When I change the code to spark.read and df.write to write out parquet file once, there is no any exception, so I guess it is not related to my hdfs config.

Can anyone help me?

jamin4 · ‎04-10-2019

simply removing hdfs:// in path option worked. keep the hdfs:// in checkpoint though

Cloudera Community

Support Questions

Uri without authority: hdfs:/data/_spark_metadata error when use spark streaming write parquet file on hdfs