Member since
03-01-2017
5
Posts
1
Kudos Received
0
Solutions
04-12-2017
12:51 PM
Hi, I've been trying to change the configuration file of HDFS service in Ambari (advanced settings).
The parameter I've been trying to change is "fs.trash.interval" from 360 to 60. However the problem seems not to be there. Other default parameters are not set properly but I haven't modified them.
Since those are default parameters, I don't know if I should modify them as the trash parameter is not primordial. Please have a look to the screenshots below : Thank you in advance for your help.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
04-12-2017
08:41 AM
Hi @Bala Vignesh N V I've finally solved the problem by using the blocksize parameter in the HTTP request. By setting the blocksize to a lower value, the system doesn't overload. I guess it was because the system created temporarily blocks of 64Mb with 5Mb of data. After a while the non-HDFS was overloaded and could not create more temporarily blocks. I hope I'm clear enough.
... View more
04-07-2017
09:51 AM
@Bala Vignesh N V Thank you for your explanation. You seem to know HDFS pretty well, I take this opportunity to ask you something else (but related). I'm trying to write files on a HDFS using the webhdfs RESTAPI part by part. When I define a small part (~5MB), I can see the remaining disk space decreases in relation to my upload. However the non-HDFS is also consumed while uploading but much faster. Because of that, the non-DFS reaches 0% and the upload stops. After the upload, the non-HDFS increases and reach 18.7GB again... Here are some data : File to upload : 2.2GB / Remaining : 9,9GB / non-HDFS used : 18,7GB Surprisingly, the non-HDFS used reaches 0GB while I upload a 2.2GB file. It doesn't decrease so much when I define a larger part (~50MB). Is it a cache problem ? I tried to use the "buffersize" in my request (corresponding to the part size) but it doesn't seem to change anything.
... View more
04-07-2017
08:58 AM
Hi @Bala Vignesh N V That's strange, when I started my virtual machine today, the disk usage has been reclaimed and I got 10GB back.
I guess it reached the trash time interval which was set on 360 minutes. However, I thought emptying the bin doesn't use this configuration. When running your command I get : [root@sandbox ~]# du -hsx * | sort -rh | head -10
368K blueprint.json
12K jce_policy-8.zip
8.0K install.log
4.0K sandbox.info
4.0K install.log.syslog
4.0K hdp
4.0K build.out
4.0K anaconda-ks.cfg
0 start_hbase.sh
0 start_ambari.sh
[root@sandbox ~]#
So I guess the non-DFS used is just reserved space.
... View more
04-06-2017
02:27 PM
1 Kudo
Hi, I'm running the sandbox on a VirtualBox virtual machine, this is a single-node cluster with a replication factor of 1. After deleting files in the Hadoop file system and removing them from the trash, I don't get disk space back even after waiting for a while.
I tried to use: [hdfs@sandbox ~]$ hadoop fs -expunge
[hdfs@sandbox ~]$ When I use hdfs dfsadmin -report, I get: [hdfs@sandbox ~]$ hdfs dfsadmin -report
Configured Capacity: 45103345664 (42.01 GB)
Present Capacity: 25068261376 (23.35 GB)
DFS Remaining: 2002014208 (1.86 GB)
DFS Used: 23066247168 (21.48 GB)
DFS Used%: 92.01%
Under replicated blocks: 70
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 172.17.0.2:50010 (sandbox.hortonworks.com)
Hostname: sandbox.hortonworks.com
Decommission Status : Normal
Configured Capacity: 45103345664 (42.01 GB)
DFS Used: 23066247168 (21.48 GB)
Non DFS Used: 20035084288 (18.66 GB)
DFS Remaining: 2002014208 (1.86 GB)
DFS Used%: 51.14%
DFS Remaining%: 4.44%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 4
Last contact: Thu Apr 06 13:36:57 UTC 2017
As you can see, it says that I use 21.48 GB. However, when I execute this other command I get a total of ~11.4GB [hdfs@sandbox ~]$ hdfs dfs -du -h /
0 /app-logs
181.2 M /apps
0 /ats
9.5 G /demo
869.1 M /hdp
0 /mapred
0 /mr-history
269.2 M /ranger
6.0 K /spark-history
24.9 K /spark2-history
8.2 K /tmp
656.4 M /user
[hdfs@sandbox ~]$
The disk usage is the same as before the deletion I found a topic about the same issue . However, I don't have any snapshots. [hdfs@sandbox ~]$ hdfs lsSnapshottableDir
[hdfs@sandbox ~]$
How could I reclaim this disk usage ?
... View more
Labels:
- Labels:
-
Apache Hadoop