Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

load a dataframe into partitioned Hive table

load a dataframe into partitioned Hive table

New Contributor

 

I have an existing table with data which is already partitioned and I am inserting data from a dataframe (created from avro file) during nightly batch.
While inserting, do I need to partition dataframe with same columns (as partitioned columns in Hive table)
or I can directly insertinto table?

 

So far I was doing like this, which is working fine – df.coalesce(4).write.insertInto(table)

 

now I am thinking about — df.write.partitionBy(‘country’,‘year’, ‘month’).insertInto(table)

 

Also does it improve performance in later case?