Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to skip trash when using INSERT OVERWRITE in hive

avatar

Application has multiple jobs with ‘insert overwrite’ route and that is filling up the trash dir (under home dir, by default) with deleted records. As quotas are implemented, this is causing to exceed the threshold size. do we have any option to skip trash in hive per session basis, or , any option to change the trash dir location per session.

11 REPLIES 11

avatar
Expert Contributor

Hi @pradeep kammella,

When you perform INSERT OVERWRITE into a table the old data will be moved to trash for some duration.

To avoid data moving into trash and free up space immediately just specify auto.purge=true

TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false")

Thanks.

Hope it helps!

avatar

Much appreciated.

avatar
Expert Contributor

@pradeep kammella : If you found this answer is helpful, please take a moment to login and click the "accept" link on the answer. Thanks!!!

avatar
Explorer

Hi , I am facing same issue. Table is holding very huge data and while doing insert overwrite , files are getting placed in my user directory, /user/anjali/.Trash, causing hive action in oozie to fail after 1.5 hr long run. Please help. The table is external and ev even I changed it to internal table, auto purge = true is not working.

avatar
Community Manager

@AnjaliRocks , As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Explorer

@VidyaSargur , I had started a new thread for the issue, and no solution received. So i was digging out old posts.

 

My thread :- https://community.cloudera.com/t5/Support-Questions/Trash-space-issue-during-insert-overwrite-in-Hiv...

 

This issue is causing trouble in my org and unable to solve.

avatar
Community Manager

@AnjaliRocks , I see that our expert @paras has responded to your thread. Can you please check if his response is helpful? Please feel free to @ mention him for further queries. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Replied on the new thread

avatar
Explorer

@paras , Thanks a lot for your reply. The solution you had provided was for spark oozie action.

 

I was able to solve this using the same configuration --conf spark.hadoop.dfs.user.home.dir.prefix=/tmp 2 days ago. 

 

This was during ingestion part of flow. So ultimately my sqoop and spark jobs are redirecting any .Trash to my tmp directory which has enough quota. Now I am facing this issue with Hive action where I am not sure of such configuration equivalend to --conf spark.hadoop.dfs.user.home.dir.prefix=/appldigi/tmp or 

-Dyarn.app.mapreduce.am.staging-dir=/tmp. 

Can you please guide on this . I am unable to solve this.

 

I am trying to execute hiveql which is insert overwrite script. I have already tried auto.purge = true option which is not working