Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Write JSON as ORC to Hadoop with Spark-Java

Write JSON as ORC to Hadoop with Spark-Java

New Contributor

I have Hive table (STORED AS ORC) and JSON data.

I'm write the JSON to my ORC table as follow:

JavaRDD<String> jsonData = rdd.map(t -> t.value());//Get the json from RDD

Dataset<String> jsonSet = sparkSession.createDataset(JavaRDD.toRDD(jsonData), org.apache.spark.sql.encoders.STRING());

Dataset<Row> df = sparkSession.read.json(jsonSet);

Dataset<Row> dfSelect = df.select(cols);//cols - Column[] that contains the columns that I need from Json

dfSelect.write().format("orc").mode("append").save(path);

This code works fine and insert the data into my table but my problem is the data is not insert to the correct columns, If my ORC table looks like

colA       colB     colC       colD        colE
------------------------------------------------
valubB    valueD    valueE  

And I make select from json to columns ColB, ColD, ColE and make the insert the data inserted to the first 3 columns instead to ColB, ColD,ColE

How can I solve it? I need to get the schema from the Json dataset?

Thanks.

4 REPLIES 4
Highlighted

Re: Write JSON as ORC to Hadoop with Spark-Java

@Is Ta you should to map the jsonset dataframe to same schema as ORC table before saving it.

Re: Write JSON as ORC to Hadoop with Spark-Java

New Contributor

How can I do it?

Re: Write JSON as ORC to Hadoop with Spark-Java

Super Guru

@Is Ta If you selected certain columns from the dataframe prior saving to orc, only those columns should show up in the orc file. Can you run a printSchema on your json DF after you select certain columns and post it here

Re: Write JSON as ORC to Hadoop with Spark-Java

New Contributor

That's exactly what I'm did, I just selected the columns with name like my ORC table and that's isn't write to the columns

Don't have an account?
Coming from Hortonworks? Activate your account here