Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

load a dataframe into partitioned Hive table


load a dataframe into partitioned Hive table

New Contributor


I have an existing table with data which is already partitioned and I am inserting data from a dataframe (created from avro file) during nightly batch.
While inserting, do I need to partition dataframe with same columns (as partitioned columns in Hive table)
or I can directly insertinto table?


So far I was doing like this, which is working fine – df.coalesce(4).write.insertInto(table)


now I am thinking about — df.write.partitionBy(‘country’,‘year’, ‘month’).insertInto(table)


Also does it improve performance in later case?

Don't have an account?
Coming from Hortonworks? Activate your account here