Support Questions

vignesh_asokan2 · ‎01-13-2018

I Am trying to get data-set from a existing non partitioned hive table and trying an insert into partitioned Hive external table. How do i do that in Pyspark Sql.? Any help would be appreciated,

I am currently using the below command. The Hive External table has multiple partitions.

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

The spark job is running successfully but no data is written to the HDFS partitions of the Hive external table.

Shu_ashu · ‎01-13-2018

@Vignesh Asokan

1.Do

hive# desc formatted <hive-external-partitioned-table>;

get the Location details from desc formatted statement and do

bash$ hdfs dfs -ls <hdfs-location>

Check is there any partitions created (or) not.

2.In pyspark shell after executing below statement

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

in pyspark shell logs shows where the partition directory is creating in HDFS location

Example:-

18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23

As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,

my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.

if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

View solution in original post

Shu_ashu · ‎01-13-2018

@Vignesh Asokan

1.Do

hive# desc formatted <hive-external-partitioned-table>;

get the Location details from desc formatted statement and do

bash$ hdfs dfs -ls <hdfs-location>

Check is there any partitions created (or) not.

2.In pyspark shell after executing below statement

df.write.mode("overwrite").partitionBy("col1","col2").insertInto("Hive external Partitioned Table")

in pyspark shell logs shows where the partition directory is creating in HDFS location

Example:-

18/01/13 17:47:52 INFO FileUtils: Creating directory if it doesn't exist: hdfs://******/apps/hive/warehouse/partition_table/daily=2017-12-23

As you can see above in pyspark shell logs it has shown creating directory in /apps/hive/warehouse/partition_table/daily=2017-12-23,

my table name is partition_table and having daily is the partition column spark has created partition in HDFS /apps/hive/warehouse/partition_table/ directory.

if you are not able to figure out the issue share more details (pyspark shell logs, table location details and statements that you are executing in pyspark shell).

vignesh_asokan2 · ‎01-14-2018

Hi

Thanks for the answer, it works.

Thanks

Vignesh Asokan

Rak · ‎01-20-2020

Dear Sir, i want to enter python calculated values which are saved in df4 should get stored a column of ''per" per is name of column in hive table .

df2 = pd.read_sql("SELECT * FROM ggg where clientcode='55'", conn)

this is how i am reading the data how to write df4 python calculated values in "per" column of hive table.

Cloudera Community

Support Questions

Inserting into exiting partitioned Hive table using HiveContext in pyspark