Member since
03-01-2017
5
Posts
1
Kudos Received
0
Solutions
04-12-2017
08:41 AM
Hi @Bala Vignesh N V I've finally solved the problem by using the blocksize parameter in the HTTP request. By setting the blocksize to a lower value, the system doesn't overload. I guess it was because the system created temporarily blocks of 64Mb with 5Mb of data. After a while the non-HDFS was overloaded and could not create more temporarily blocks. I hope I'm clear enough.
... View more
04-07-2017
09:51 AM
@Bala Vignesh N V Thank you for your explanation. You seem to know HDFS pretty well, I take this opportunity to ask you something else (but related). I'm trying to write files on a HDFS using the webhdfs RESTAPI part by part. When I define a small part (~5MB), I can see the remaining disk space decreases in relation to my upload. However the non-HDFS is also consumed while uploading but much faster. Because of that, the non-DFS reaches 0% and the upload stops. After the upload, the non-HDFS increases and reach 18.7GB again... Here are some data : File to upload : 2.2GB / Remaining : 9,9GB / non-HDFS used : 18,7GB Surprisingly, the non-HDFS used reaches 0GB while I upload a 2.2GB file. It doesn't decrease so much when I define a larger part (~50MB). Is it a cache problem ? I tried to use the "buffersize" in my request (corresponding to the part size) but it doesn't seem to change anything.
... View more
04-07-2017
08:58 AM
Hi @Bala Vignesh N V That's strange, when I started my virtual machine today, the disk usage has been reclaimed and I got 10GB back.
I guess it reached the trash time interval which was set on 360 minutes. However, I thought emptying the bin doesn't use this configuration. When running your command I get : [root@sandbox ~]# du -hsx * | sort -rh | head -10
368K blueprint.json
12K jce_policy-8.zip
8.0K install.log
4.0K sandbox.info
4.0K install.log.syslog
4.0K hdp
4.0K build.out
4.0K anaconda-ks.cfg
0 start_hbase.sh
0 start_ambari.sh
[root@sandbox ~]#
So I guess the non-DFS used is just reserved space.
... View more
04-06-2017
02:27 PM
1 Kudo
Hi, I'm running the sandbox on a VirtualBox virtual machine, this is a single-node cluster with a replication factor of 1. After deleting files in the Hadoop file system and removing them from the trash, I don't get disk space back even after waiting for a while.
I tried to use: [hdfs@sandbox ~]$ hadoop fs -expunge
[hdfs@sandbox ~]$ When I use hdfs dfsadmin -report, I get: [hdfs@sandbox ~]$ hdfs dfsadmin -report
Configured Capacity: 45103345664 (42.01 GB)
Present Capacity: 25068261376 (23.35 GB)
DFS Remaining: 2002014208 (1.86 GB)
DFS Used: 23066247168 (21.48 GB)
DFS Used%: 92.01%
Under replicated blocks: 70
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 172.17.0.2:50010 (sandbox.hortonworks.com)
Hostname: sandbox.hortonworks.com
Decommission Status : Normal
Configured Capacity: 45103345664 (42.01 GB)
DFS Used: 23066247168 (21.48 GB)
Non DFS Used: 20035084288 (18.66 GB)
DFS Remaining: 2002014208 (1.86 GB)
DFS Used%: 51.14%
DFS Remaining%: 4.44%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 4
Last contact: Thu Apr 06 13:36:57 UTC 2017
As you can see, it says that I use 21.48 GB. However, when I execute this other command I get a total of ~11.4GB [hdfs@sandbox ~]$ hdfs dfs -du -h /
0 /app-logs
181.2 M /apps
0 /ats
9.5 G /demo
869.1 M /hdp
0 /mapred
0 /mr-history
269.2 M /ranger
6.0 K /spark-history
24.9 K /spark2-history
8.2 K /tmp
656.4 M /user
[hdfs@sandbox ~]$
The disk usage is the same as before the deletion I found a topic about the same issue . However, I don't have any snapshots. [hdfs@sandbox ~]$ hdfs lsSnapshottableDir
[hdfs@sandbox ~]$
How could I reclaim this disk usage ?
... View more
Labels:
- Labels:
-
Apache Hadoop