i have the below script to delete the hdfs path for all the tables from the database . i have retrieved all the tables from the database and write the tables list in the file .after that i am reading one by one and finding the path and deleting them .
in the for loop we are opening again hive for each table it is not good to for the around 2000 tables .I need some help to write the code to get all paths in one hive connection instead of multiple times . can you please let me know if anyone can help me ..
Create table list for a schema (script parameter)
HIVE_SCHEMA=fin_ps_prdstg_tbls echo Processing Hive schema $HIVE_SCHEMA... tablelist=tables_$HIVE_SCHEMA
rm -r $tablelist hive -e " set hive.cli.print.header=false; use $HIVE_SCHEMA; show tables like '*stg';" 1> $tablelist
number of tables
tableNum_limit=$(cat $tablelist | wc -l) echo "number of tables in the file $tableNum_limit "
For each table do:
for table in $(cat $tablelist|sort|head -n "$tableNum_limit") do
echo Processing table $table ...
#Call DESCRIBE
out=$(hive client -S -e "use $HIVE_SCHEMA; DESCRIBE EXTENDED $table")
#Get location for example
table_location=$(echo "${out}" | egrep -o 'location:[^,]+' | sed 's/location://')
echo Table location: $table_location
#removing the path
#hdfs dfs -rm -r -skipTrash $table_location #need to remove this comment
done