11-05-2017 07:25 AM
I have an existing table with data which is already partitioned and I am inserting data from a dataframe (created from avro file) during nightly batch.
While inserting, do I need to partition dataframe with same columns (as partitioned columns in Hive table)
or I can directly insertinto table?
So far I was doing like this, which is working fine – df.coalesce(4).write.insertInto(table)
now I am thinking about — df.write.partitionBy(‘country’,‘year’, ‘month’).insertInto(table)
Also does it improve performance in later case?