Support Questions

Find answers, ask questions, and share your expertise

Inserting into exiting partitioned Hive table using HiveContext in pyspark

New Contributor

I Am trying to get data-set from a existing non partitioned hive table and trying an insert into partitioned Hive external table. How do i do that in Pyspark Sql.? Any help would be appreciated,

I am currently using the below command. The Hive External table has multiple partitions.

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

The spark job is running successfully but no data is written to the HDFS partitions of the Hive external table.


Master Guru

@Vignesh Asokan


hive# desc formatted <hive-external-partitioned-table>;

get the Location details from desc formatted statement and do

bash$ hdfs dfs -ls <hdfs-location>

Check is there any partitions created (or) not.

2.In pyspark shell after executing below statement

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

in pyspark shell logs shows where the partition directory is creating in HDFS location


18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23
As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,

my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.

if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

View solution in original post


Master Guru

@Vignesh Asokan


hive# desc formatted <hive-external-partitioned-table>;

get the Location details from desc formatted statement and do

bash$ hdfs dfs -ls <hdfs-location>

Check is there any partitions created (or) not.

2.In pyspark shell after executing below statement

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

in pyspark shell logs shows where the partition directory is creating in HDFS location


18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23
As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,

my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.

if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

New Contributor


Thanks for the answer, it works.


Vignesh Asokan


Dear Sir, i want to enter python calculated values which are saved in df4 should get stored a column of ''per" per is name of column in hive table .

df2 = pd.read_sql("SELECT * FROM ggg where clientcode='55'", conn)

this is how i am reading the data how to write df4  python calculated values in "per" column of hive table.