Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

Highlighted

HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

New Contributor

Hi, While doing some kind of investigation Under Replicated Blocks metric in Ambari, i've noticed that as soon as you upload a file using Amabari Files View, the replication factor is 3. - dfs.replication is 1 in /etc/hadoop/conf/hdfs-site.xml and i've added (just in case) mapreduce.client.submit.file.replication = 1 to mapred configuration.

If i just upload the file via hdfs command line, (hdfs dfs -put), replication factor is 1.

I'm missing something? Can you help?

8 REPLIES 8

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

Super Mentor

@Luis Marques

How are you confirming that when you are uploading file via FileView then the dfs.replication value is being taken as 3?

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

New Contributor

Hi @Jay SenSharma Two different ways: hdfs dfs -ls /dir will show you "3" at replication factor and also hdfs fsck / confirms that files uploaded by ambari files view want 3 replicas and only have one (of course, as the sandbox is only one node).

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

Super Mentor

@Luis Marques

I am able to reproduce the behavior that you mentioned. Looks strange.

You might be definitely noticing many 'Under replicated' in that case. As a temporary fix you can try fixing it by explicitly setting the replication 1 as following:

# hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/all_under_replicated_files 

# for hdfsfile in `cat /tmp/all_under_replicated_files`; do echo "Fixing $hdfsfile and setting replication 1:" ;  hadoop fs -setrep 1 $hdfsfile; done

.

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

New Contributor

Thanks @Jay SenSharma.

It looks strange actually, because there is nothing in the configs that points to replication factor of 3. May have to digg how ambari files view works under the wood. Yes, that is a ugly workaround but it works :-)

But real problem is actually, the "real clusters" that we are working now under HDInsight to see if the problem persists or not. I will let you know next week

Thanks

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

Super Mentor

@Luis Marques

Looks like the FileView is taking the default value of "dfs.replication" that is 3 (default). Thats why int he FileView code i do not see this property directly being used somewhere. I will research more on it.

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

New Contributor

@Jay SenSharma

Thanks for your quick confirmation. Is this a bug from HDP 2.5 Sandbox only, or it is present in HDP 2.5 in general? I've check Amabari source code for FileView, but can't find any real evidence of a specific dfs.replication property read.

Many thanks for your help.

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

Super Mentor

@Luis Marques

This issue seems to be already addressed in Ambari 2.5. As there are many changes happeded in the functionality of FileView.

I built Ambari-2.5 and tested the same and i see that the NameNode log shows that only one replica of the file is available when "dfs.replication" is set to "1" and a file is uploaded via File View. As following:

# grep 'TestFile.txt' hadoop-hdfs-namenode-node2.localdomain.log

2017-03-20 10:21:43,606 INFO  hdfs.StateChange (FSNamesystem.java:logAllocatedBlock(3830)) - BLOCK* allocate blk_1073741964_1140, replicas=172.17.100.4:50010 for /tmp/TestFile.txt

- On the other hand in Ambari 2.4.2 i see that File View will use the default value "dfs.replication" (which is 3) hence we see 3 node replication as following:

# grep 'TestFile.txt' hadoop-hdfs-namenode-erie1.example.com.log

2017-03-20 10:56:58,157 INFO  hdfs.StateChange (FSNamesystem.java:logAllocatedBlock(3692)) - BLOCK* allocate blk_1073771745_31238, replicas=172.26.70.151:50010, 172.26.70.153:50010, 172.26.70.152:50010 for /tmp/TestFile.txt

.

Re: HDP 2.5 Sandbox replication factor 3 in Ambari Files View upload

New Contributor

I just checked with 2.6 version, and apparently Ambari Files View always makes use of the blocksize and replication parameters when posting WebHDFS REST API queries. Unless you specify your own values, it pastes default ones available in some .json configuration files in the Ambari installation directory. Actually, these Ambari default values happen to be exactly the same as Hadoop default configuration values (replication 3 blocks, block size 128 MB), so it might be misleading in respect of the issue.


Should you specify your own values, you can provide them in the Files Settings / View Configs pane. Below config is mine: 2 MB block size, 2-block replica.

64767-settings.png