Created 01-13-2018 08:51 PM
I Am trying to get data-set from a existing non partitioned hive table and trying an insert into partitioned Hive external table. How do i do that in Pyspark Sql.? Any help would be appreciated,
I am currently using the below command. The Hive External table has multiple partitions.
df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")
The spark job is running successfully but no data is written to the HDFS partitions of the Hive external table.
Created 01-13-2018 11:02 PM
1.Do
hive# desc formatted <hive-external-partitioned-table>;
get the Location details from desc formatted statement and do
bash$ hdfs dfs -ls <hdfs-location>
Check is there any partitions created (or) not.
2.In pyspark shell after executing below statement
df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")
in pyspark shell logs shows where the partition directory is creating in HDFS location
Example:-
18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,
my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.
if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).
Created 01-13-2018 11:02 PM
1.Do
hive# desc formatted <hive-external-partitioned-table>;
get the Location details from desc formatted statement and do
bash$ hdfs dfs -ls <hdfs-location>
Check is there any partitions created (or) not.
2.In pyspark shell after executing below statement
df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")
in pyspark shell logs shows where the partition directory is creating in HDFS location
Example:-
18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,
my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.
if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).
Created 01-14-2018 09:30 AM
Hi
Thanks for the answer, it works.
Thanks
Vignesh Asokan
Created on 01-20-2020 03:55 AM - edited 01-20-2020 04:37 AM
Dear Sir, i want to enter python calculated values which are saved in df4 should get stored a column of ''per" per is name of column in hive table .
df2 = pd.read_sql("SELECT * FROM ggg where clientcode='55'", conn)
this is how i am reading the data how to write df4 python calculated values in "per" column of hive table.