About ArpitAgarwal

ArpitAgarwal · ‎12-01-2020

The DataNodes should run the same software version as the NameNode.

ArpitAgarwal · ‎08-01-2019

I'm assuming you mean just to store the metadata of the changed snapshot and which isn't significant given the actual size of data held(in reference to my example above) Correct. However the metadata is tracked in NameNode memory which is a precious resource. The overhead can be significant in a large cluster with many files and millions of deltas.

ArpitAgarwal · ‎08-01-2019

The snapshot will not occupy any storage space on disk or NameNode heap immediately after it is created. However any subsequent changes inside the snapshottable directory will need to be tracked as deltas and that can result in both higher disk space and NameNode heap usage. E.g. if a file is deleted after taking a snapshot, the blocks cannot be reclaimed because the file is still accessible through the snapshot path. The hadoop fs -du shell command supports a -x option that allows calculating directory space usage excluding snapshots. The delta between the output with and without the -x option will tell you how much disk space is being consumed by the snapshot.

ArpitAgarwal · ‎02-27-2018

Building Apache Tez with Apache Hadoop 2.8.0 or later fails due to client/server jar separation in Hadoop [1]. The build fails with the following error: [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30] cannot find symbol symbol: class DistributedFileSystem location: package org.apache.hadoop.hdfs [ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50] cannot find symbol symbol: class DistributedFileSystem location: class org.apache.tez.client.TestTezClientUtils [ERROR] /src/tez/ecosystem/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42] cannot access org.apache.hadoop.hdfs.DistributedFileSystem To get Tez to compile successfully, you will need to use the new hadoop28 profile introduced by TEZ-3690 [2]. E.g. here is how you compile Tez with Apache Hadoop 3.0.0: mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Phadoop28 -Dhadoop.version=3.0.0 References: 1. HDFS-6200: Create a separate jar for hdfs-client 2. TEZ-3690: Tez on hadoop 3 build failed due to hdfs client/server jar separation.

ArpitAgarwal · ‎02-07-2018

There is no such thing as a "passive" NameNode. Are you asking about the HA or non-HA configuration? In an HA configuration, there is Active NameNode that serves user requests. Standby NameNode that generates periodic checkpoints. It can also take over the role of the Active if the previously active NameNode dies or becomes unresponsive. In a non-HA configuration Primary NameNode that serves user requests. Secondary NameNode that generates periodic checkpoints. A secondary NameNode can never become the primary. The terminology is unfortunately confusing.

ArpitAgarwal · ‎10-27-2017

Try clearing up some snapshots. You probably have a ton of deleted files retained for snapshots.

ArpitAgarwal · ‎10-27-2017

Did you enable security using the Ambari Kerberos wizard? That usually takes care of these settings for you.

ArpitAgarwal · ‎10-27-2017

A few things to check for: Are you starting the DataNode process as root? Have you set HADOOP_SECURE_DN_USER and JSVC_HOME? Since you are using a privileged port number (<1024), ensure you have not set dfs.data.transfer.protection. The Apache Hadoop documentation for Secure DN setup is good. https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataNode

ArpitAgarwal · ‎10-26-2017

It is likely the process has not hit an allocation failure yet so GC has not kicked in. This is perfectly normal. If you want the heap usage to be lower then you can reduce the heap allocation. Alternatively you can trigger GC quicker by adding something like -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly to your heap options. However it's probably best to just follow our suggested heap configuration and let the Java runtime do the rest. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/configuring-namenode-heap-size.html

ArpitAgarwal · ‎10-13-2017

@Dr. Jason Breitweg, it will not be deleted automatically. There may be block files under that directory that you need. If the cluster has any important data - I'd recommend running 'hdfs fsck' to ensure there are no missing/corrupt blocks before you delete /var/hadoop/hdfs/data/current/BP-*. Even then I'd first move the directory to a different location, restart DataNodes and rerun fsck to ensure you don't cause data loss.

Online	Offline
Last Visited	‎11-03-2023 01:06 PM

Member Since	‎07-30-2019 10:45 AM
Last Visited	‎11-03-2023 01:06 PM
Posts	111
Kudos received	185

Cloudera Community

Re: What is active and passive NameNode in Hadoop?

Re: NameNode heapsize is bigger then it should be.

Re: Delete old BP-* DataNode directories by hand?

Re: NameNode edit logs - purging/Best practises

Re: Hadoop 3.0 in a Virtual Box for beginners

Re: 2 versions of datanodes currently live

Re: Does snapshot occupy space in HDFS.

Re: Does snapshot occupy space in HDFS.

Compiling Apache Tez with Apache Hadoop 2.8.0 or l...

Re: What is active and passive NameNode in Hadoop?

Re: NameNode heapsize is bigger then it should be.

Re: Unable to Start DataNode in kerberos cluster

Re: Unable to Start DataNode in kerberos cluster

Re: NameNode heapsize is bigger then it should be.

Re: Delete old BP-* DataNode directories by hand?