Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to skip trash when using INSERT OVERWRITE in hive

Highlighted

how to skip trash when using INSERT OVERWRITE in hive

New Contributor

Application has multiple jobs with ‘insert overwrite’ route and that is filling up the trash dir (under home dir, by default) with deleted records. As quotas are implemented, this is causing to exceed the threshold size. do we have any option to skip trash in hive per session basis, or , any option to change the trash dir location per session.

11 REPLIES 11
Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Expert Contributor

Hi @pradeep kammella,

When you perform INSERT OVERWRITE into a table the old data will be moved to trash for some duration.

To avoid data moving into trash and free up space immediately just specify auto.purge=true

TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false")

Thanks.

Hope it helps!

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

New Contributor

Much appreciated.

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Expert Contributor

@pradeep kammella : If you found this answer is helpful, please take a moment to login and click the "accept" link on the answer. Thanks!!!

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Explorer

Hi , I am facing same issue. Table is holding very huge data and while doing insert overwrite , files are getting placed in my user directory, /user/anjali/.Trash, causing hive action in oozie to fail after 1.5 hr long run. Please help. The table is external and ev even I changed it to internal table, auto purge = true is not working.

Re: how to skip trash when using INSERT OVERWRITE in hive

Community Manager

@AnjaliRocks , As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. 


Vidya Sargur, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Explorer

@VidyaSargur , I had started a new thread for the issue, and no solution received. So i was digging out old posts.

 

My thread :- https://community.cloudera.com/t5/Support-Questions/Trash-space-issue-during-insert-overwrite-in-Hiv...

 

This issue is causing trouble in my org and unable to solve.

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Community Manager

@AnjaliRocks , I see that our expert @paras has responded to your thread. Can you please check if his response is helpful? Please feel free to @ mention him for further queries. 


Vidya Sargur, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Expert Contributor

Replied on the new thread

Highlighted

Re: how to skip trash when using INSERT OVERWRITE in hive

Explorer

@paras , Thanks a lot for your reply. The solution you had provided was for spark oozie action.

 

I was able to solve this using the same configuration --conf spark.hadoop.dfs.user.home.dir.prefix=/tmp 2 days ago. 

 

This was during ingestion part of flow. So ultimately my sqoop and spark jobs are redirecting any .Trash to my tmp directory which has enough quota. Now I am facing this issue with Hive action where I am not sure of such configuration equivalend to --conf spark.hadoop.dfs.user.home.dir.prefix=/appldigi/tmp or 

-Dyarn.app.mapreduce.am.staging-dir=/tmp. 

Can you please guide on this . I am unable to solve this.

 

I am trying to execute hiveql which is insert overwrite script. I have already tried auto.purge = true option which is not working 

Don't have an account?
Coming from Hortonworks? Activate your account here