Created 11-26-2018 11:48 AM
Application has multiple jobs with ‘insert overwrite’ route and that is filling up the trash dir (under home dir, by default) with deleted records. As quotas are implemented, this is causing to exceed the threshold size. do we have any option to skip trash in hive per session basis, or , any option to change the trash dir location per session.
Created 11-26-2018 12:19 PM
When you perform INSERT OVERWRITE into a table the old data will be moved to trash for some duration.
To avoid data moving into trash and free up space immediately just specify auto.purge=true
TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false")
Thanks.
Hope it helps!
Created 11-26-2018 04:01 PM
Much appreciated.
Created 11-28-2018 08:08 PM
@pradeep kammella : If you found this answer is helpful, please take a moment to login and click the "accept" link on the answer. Thanks!!!
Created 06-27-2020 12:39 PM
Hi , I am facing same issue. Table is holding very huge data and while doing insert overwrite , files are getting placed in my user directory, /user/anjali/.Trash, causing hive action in oozie to fail after 1.5 hr long run. Please help. The table is external and ev even I changed it to internal table, auto purge = true is not working.
Created 06-29-2020 12:45 AM
@AnjaliRocks , As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question.
Regards,
Vidya Sargur,Created 06-29-2020 01:02 AM
@VidyaSargur , I had started a new thread for the issue, and no solution received. So i was digging out old posts.
My thread :- https://community.cloudera.com/t5/Support-Questions/Trash-space-issue-during-insert-overwrite-in-Hiv...
This issue is causing trouble in my org and unable to solve.
Created 06-29-2020 01:16 AM
@AnjaliRocks , I see that our expert @paras has responded to your thread. Can you please check if his response is helpful? Please feel free to @ mention him for further queries.
Regards,
Vidya Sargur,Created 06-29-2020 02:03 AM
Replied on the new thread
Created 06-29-2020 09:12 AM
@paras , Thanks a lot for your reply. The solution you had provided was for spark oozie action.
I was able to solve this using the same configuration --conf spark.hadoop.dfs.user.home.dir.prefix=/tmp 2 days ago.
This was during ingestion part of flow. So ultimately my sqoop and spark jobs are redirecting any .Trash to my tmp directory which has enough quota. Now I am facing this issue with Hive action where I am not sure of such configuration equivalend to --conf spark.hadoop.dfs.user.home.dir.prefix=/appldigi/tmp or
-Dyarn.app.mapreduce.am.staging-dir=/tmp.
Can you please guide on this . I am unable to solve this.
I am trying to execute hiveql which is insert overwrite script. I have already tried auto.purge = true option which is not working