Support Questions

Find answers, ask questions, and share your expertise

hdfs path for mulitple external hive table

avatar

i have the below script to delete the hdfs path for all the tables from the database . i have retrieved all the tables from the database and write the tables list in the file .after that i am reading one by one and finding the path and deleting them .

in the for loop we are opening again hive for each table it is not good to for the around 2000 tables .I need some help to write the code to get all paths in one hive connection instead of multiple times . can you please let me know if anyone can help me ..

Create table list for a schema (script parameter)

HIVE_SCHEMA=fin_ps_prdstg_tbls echo Processing Hive schema $HIVE_SCHEMA... tablelist=tables_$HIVE_SCHEMA

rm -r $tablelist hive -e " set hive.cli.print.header=false; use $HIVE_SCHEMA; show tables like '*stg';" 1> $tablelist

number of tables

tableNum_limit=$(cat $tablelist | wc -l) echo "number of tables in the file $tableNum_limit "

For each table do:

for table in $(cat $tablelist|sort|head -n "$tableNum_limit") do

echo Processing table $table ...

#Call DESCRIBE
out=$(hive client -S -e "use $HIVE_SCHEMA; DESCRIBE EXTENDED $table")

#Get location for example
table_location=$(echo "${out}" | egrep -o 'location:[^,]+' | sed 's/location://')
echo Table location: $table_location
#removing the path
#hdfs dfs -rm -r -skipTrash $table_location #need to remove this comment
done






2 REPLIES 2

avatar
Contributor

Did you try getting the path from metadata database. We can get the path of all hive tables with single query. Let me know if you already tried that.

-Shashi

avatar

Sorry we don't have access for that .Can you please share that it will be useful for my future. Now i written different approach to get all hdfs paths and trying to delete them all in one shot .is there anyway to do that.


hive -e " set hive.cli.print.header=false; use $HIVE_SCHEMA;show tables like '*stg';" 1> $tablelist

tableNum_limit=$(cat $tablelist | wc -l)

echo "number of tables in the file $tableNum_limit "

#sed -i e"s/.*/SHOW CREATE TABLE $HIVE_SCHEMA.&;\" $tablelist

sed -i -e "s/.*/DESCRIBE FORMATTED $HIVE_SCHEMA.&;/" $tablelist

#create the file with all describe formatted output

hive -f $tablelist > $loctbl

#remove all the line except the hdfs path


sed -i -e '/^Location/!d;s/Location:\s\+//' $loctbl

#sed -Eni '{N; /^LOCATION\s+/p ; D}' $loctbl

It will retrieve all the hdfs paths into file for the given database and I am trying to delete them is there any short way to delete all in stead of using hdfs dfs -rm -r -skipTrash $table_location each path .