Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

mapping data from spark into hive table

Solved Go to solution

mapping data from spark into hive table

New Contributor

Spark is getting data with a header showing col's A,C,D,B and the data under it. Next day we get the same data like this Col's B,D,A,C and then the next day we get data like col's A,C,B,D and so on randomly. Now we have to put this data in a hive table with col's as A,B,C,D. Can any one suggest me an idea how to write this script in spark?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: mapping data from spark into hive table

Super Guru

@Satya G

Read the CSV file with header as described here:

https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader

Once you are able to read the csv file with header then use .select method and select the col's as

#pyspark:

df= spark.read.csv(<file>).option("header", "true") //read the csv with header
df1=df.select("A","B","C","D") //select the columns in an order
df1.write.mode("<overwrite/append>").saveAsTable("<db_name>.<tab_name>")
2 REPLIES 2

Re: mapping data from spark into hive table

Super Guru

@Satya G

Read the CSV file with header as described here:

https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader

Once you are able to read the csv file with header then use .select method and select the col's as

#pyspark:

df= spark.read.csv(<file>).option("header", "true") //read the csv with header
df1=df.select("A","B","C","D") //select the columns in an order
df1.write.mode("<overwrite/append>").saveAsTable("<db_name>.<tab_name>")

Re: mapping data from spark into hive table

New Contributor

@Shu Thank you.