Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Compaction

Highlighted

Spark Compaction

New Contributor

Hi

 

I want to merge small avro files into one single avro file.The code which I followed is compact.java which is available in github in the mentioned link https://github.com/KeithSSmith/spark-compaction 

Iam getting error  as "Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame"

 

The command which I used is 

 

spark2-submit  \
--class org.cloudera.com.spark_compaction.Compact \
--master yarn \
./spark-compaction-0.0.1-SNAPSHOT.jar \
--input-path /tmp/d3-raw/landing/year=2017/month=4/ \
--output-path /tmp/d3-raw/landing/year=2017/month=4/output \
--input-compression none \
--input-serialization avro \
--output-compression none \
--output-serialization avro

 

 

spark2-submit --packages com.databricks:spark-avro_2.11:3.0.0 \
--class org.cloudera.com.spark_compaction.Compact \
--master yarn \
./spark-compaction-0.0.1-SNAPSHOT.jar \
--input-path /tmp/d3-raw/landing/year=2017/month=4/ \
--output-path /tmp/d3-raw/landing/year=2017/month=4/output \
--input-compression none \
--input-serialization avro \
--output-compression none \
--output-serialization avro

 

Can anyone help me out to resolve this issue ?

 

Don't have an account?
Coming from Hortonworks? Activate your account here