Member since
06-09-2015
8
Posts
0
Kudos Received
0
Solutions
03-20-2016
07:45 AM
Thanks, I ended up just removing these as they are orphaned data sets from failed sessions.
... View more
01-12-2016
04:49 PM
Recently I noticed that Cloudera Manager is showing more data in HDFS storage than I believed I was using. As such, I investigate via the command line, starting with the following command: [hdfs@cdhcan01 ~]$ hadoop fs -du -h / And I see that the /tmp directory is several hundred GB (with replication over a TB), so I dig deeper, when I check: [hdfs@cdhcan01 ~]$ hadoop fs -du -h /tmp/ I see that the majority of this space is taken up by the /tmp/hive/ subdirectory, So looking into that: [hdfs@cdhcan01 ~]$ hadoop fs -du -h /tmp/hive/ I see the following which shows a heck of a lot of storage for 2 of the users compared to everyone else: 351.8 G 1.0 T /tmp/hive/admin
0 0 /tmp/hive/anonymous
195.7 G 587.1 G /tmp/hive/cdh-oozie
0 0 /tmp/hive/csalas
0 0 /tmp/hive/hive
0 0 /tmp/hive/jculley
0 0 /tmp/hive/jfogarty
0 0 /tmp/hive/jjohnbosco
0 0 /tmp/hive/jkarmelek
0 0 /tmp/hive/jmasloski
0 0 /tmp/hive/pscott The cdh-oozie user runs many hiveserver2 actions on Oozie, so it makes sense to me that it has a lot of storage being used... it's a lot, but believable that it would use a lot of space for hive. However, that admin user is the surprise and also the big hog. I kept digging into the /tmp/hive/admin/ subdirectories and found what look like sessions from six months ago, below I show where this finally led me (there are 638 items but I just show the first 2) and this looks to me like pieces of an old hive query: [hdfs@cdhcan01 ~]$ hadoop fs -ls /tmp/hive/admin/8c933b36-60e5-412b-8039-408f2eb75005/hive_2015-06-22_17-33-05_894_2084771740530219258-4/-mr-10000/.hive-staging_hive_2015-06-22_17-33-05_894_2084771740530219258-4/-ext-10001
Found 638 items
-rw-r--r-- 3 admin supergroup 441903776 2015-06-22 17:56 /tmp/hive/admin/8c933b36-60e5-412b-8039-408f2eb75005/hive_2015-06-22_17-33-05_894_2084771740530219258-4/-mr-10000/.hive-staging_hive_2015-06-22_17-33-05_894_2084771740530219258-4/-ext-10001/000000_0
-rw-r--r-- 3 admin supergroup 448117217 2015-06-22 17:55 /tmp/hive/admin/8c933b36-60e5-412b-8039-408f2eb75005/hive_2015-06-22_17-33-05_894_2084771740530219258-4/-mr-10000/.hive-staging_hive_2015-06-22_17-33-05_894_2084771740530219258-4/-ext-10001/000001_0 I'd like to go through and clean up this /tmp/hive/admin/ directory but I'm not really sure how it's getting populated? Why wouldn't HDFS or Hive have cleaned this up on its own, especially when it looks clean for other users? Can someone point me in the right direction of figuring out if I can go ahead and start deleting these items to free up space? Finally what's generally going on to populate the /tmp/hive/ subdirectories and when does it get cleaned out? Thanks for any help or insight into this!
... View more
11-23-2015
12:59 PM
I'm running into something similar. I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok. Can you please share with me what you scripted to make "when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?
... View more