question Re: Inserting df data into exiting Hive table in a specific column in Support Questions

Inserting into exiting partitioned Hive table using HiveContext in pyspark

vignesh_asokan2 — Sun, 14 Jan 2018 04:51:20 GMT

I Am trying to get data-set from a existing non partitioned hive table and trying an insert into partitioned Hive external table. How do i do that in Pyspark Sql.? Any help would be appreciated,

I am currently using the below command. The Hive External table has multiple partitions.

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

The spark job is running successfully but no data is written to the HDFS partitions of the Hive external table.

Re: Inserting into exiting partitioned Hive table using HiveContext in pyspark

Shu_ashu — Sun, 14 Jan 2018 07:02:49 GMT

@Vignesh Asokan

1.Do

hive# desc formatted <hive-external-partitioned-table>;

get the Location details from desc formatted statement and do

bash$ hdfs dfs -ls <hdfs-location>

Check is there any partitions created (or) not.

2.In pyspark shell after executing below statement

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

in pyspark shell logs shows where the partition directory is creating in HDFS location

Example:-

18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23

As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,

my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.

if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

Re: Inserting into exiting partitioned Hive table using HiveContext in pyspark

vignesh_asokan2 — Sun, 14 Jan 2018 17:30:41 GMT

Thanks for the answer, it works.

Thanks

Vignesh Asokan

Re: Inserting df data into exiting Hive table in a specific column

Rak — Mon, 20 Jan 2020 12:37:44 GMT

Dear Sir, i want to enter python calculated values which are saved in df4 should get stored a column of ''per" per is name of column in hive table .

df2 = pd.read_sql("SELECT * FROM ggg where clientcode='55'", conn)

this is how i am reading the data how to write df4 python calculated values in "per" column of hive table.