About Shelton

Shelton · ‎10-27-2019

@mike_bronson7 Surely you can use that hdfs fsck / -delete but remember it will be put in the trash !!!

Shelton · ‎10-27-2019

@erkansirin78 Can you share the steps you executed? Have a look at this spark-shell

Shelton · ‎10-27-2019

@mike_bronson7 Regarding under replicated blocks, HDFS is supposed to recover them automatically (by creating missing copies to fulfill the replication factor) but in your case, your cluster-wide replication factor is 3 but the target is 10 It's suggesting have 5 data nodes while there are 10 replicas leading to the under replication alert! According to the output you have 2 distinct problems (a) Under replicated blocks, Target Replicas is 10 but found 5 live replica(s) [Last 2 lines] (b) Corrupt blocks with 2 different solutions Solution 1 under replicated You could force the 2 blk to align with cluster-wide replication factor by adjusting using -setrep $ hdfs dfs -setrep -w 3 [File_name] Validate by Now you should see 3 after the file permissions before the user:group like below $ hdfs dfs -ls [File_name] -rw-r--r-- 3 analyst hdfs 1068028 2019-10-27 12:30 /flighdata/airports.dat And wait for the deletion to happen or run the below snippets sequentially $ hdfs fsck / | grep 'Under replicated' $ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files $ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done For Corrupt files $ hdfs fsck / | egrep -v '^\.+$' | grep -i corrupt ...............Example output............................ /user/analyst/test9: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1055741378 /user/analyst/data1: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1056741378 /user/analyst/data2: MISSING 3 blocks of total size 338192920 B.Status: CORRUPT CORRUPT FILES: 9 CORRUPT BLOCKS: 18 Corrupt blocks: 18 The filesystem under path '/' is CORRUPT Locate corrupted block $ hdfs fsck / | egrep -v '^\.+$' | grep -i "corrupt blockpool"| awk '{print $1}' |sort |uniq |sed -e 's/://g' >corrupted.flst Get the location in the above output corrupted.flst $ hdfs fsck /user/analyst/xxxx -locations -blocks -files Remove the corrupted files hdfs dfs -rm /path/to/corrupted.flst Skip the trash to permanently delete $ hdfs dfs -rm -skipTrash /path/to/corrupt_filename. You should give the cluster sometime to rebalance in the case of under-replicated files.

Shelton · ‎10-26-2019

@mike_bronson7 Under replicated blocks There are a couple of potential source of the problem that triggers this alert! The HDP versions earlier than HDP 3.x all use the standard default 3 replication factor for reasons you know well , the ability to rebuild the data in whatever case as opposed to the new Erasure coding policies in Hadoop 3.0. Secondly, the cluster will rebalance itself if you gave it time 🙂 Having said that the first question is how many data nodes were set up in this new cluster and did you enable rack awareness? This usually means that some files are “asking” for a specific number of target replicas that are not present or not being able to get the replica. So the question is, how i know which files are asking for a number of replicas that are not available? The first option is use hdfs fsck: $ hdfs fsck / -storagepolicies ****** **************output ********************* Connecting to namenode via http://xxx.com:50070/fsck?ugi=hdfs&storagepolicies=1&path=%2F FSCK started by hdfs (auth:SIMPLE) from /192.168.0.94 for path / at Sat Oct 26 23:03:24 CEST 2019 /user/zeppelin/notebook/2EC24FF9U/note.json: Under replicated BP-2067995211-192.168.0.101-1537740712051:blk_1073751507_10767. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s). ****** Change the replication $ hdfs dfs -setrep -w 1 /user/zeppelin/notebook/2EC24FF9U/note.json Replication 1 set: /user/zeppelin/notebook/2EC24FF9U/note.json Waiting for /user/zeppelin/notebook/2EC24FF9U/note.json ... done You also need to check dfs.replication in hdfs-site.xml the default is configured to be 3. Note that it turns out that if you upload files through Ambari, the file actually gets the replication factor of 3. HTH

Shelton · ‎10-26-2019

@Atena-Dev-Team Any updates on this thread

Shelton · ‎10-26-2019

@JLo_Hernandez HDP3 uses HBase as a backend for the timeline service. You can either use an external HBase or have a system HBase running on Yarn (the default). With a server crash, your ATSv2 HBase Application data could be corrupted this can impact performance because of timeouts from the ATS To fix that follow: Check for any error in the yarn logs usually in /var/log/hadoop-yarn/yarn/ for anything clear to spot, for instance, not enough yarn memory (and then fix it if relevant) see these 2 files hadoop-yarn-timelineserver-<host_name>.out hadoop-yarn-timelineserver-<host_name>.log Clean up hdfs ATS data as described in Remove ats-HBase before switching between clusters note there are different steps for secure [Kerberized] and unsecure clusters Clean up zookeeper ATS data (the example here is for insecure clusters, you will probably have another znode for kerberised clusters): zookeeper-client rmr /atsv2-hbase-unsecure Log on zookeeper here I am on an secure HDP 3.1 single node cluster # /usr/hdp/3.1.0.0-78/zookeeper/bin/zkCli.sh [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notification, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_block, config] Go for the ATSv2 entry [zk: localhost:2181(CONNECTED) 1] ls /atsv2-hbase-unsecure [rs, splitWAL, backup-masters, table-lock, draining, master-maintenance, table] Delete the entry [zk: localhost:2181(CONNECTED) 2] rmr /atsv2-hbase-unsecure Restart *all* YARN services, Restart ambari server You lose your ATS history with the above hdfs & zookeeper cleaning steps like (ie. job names, timing, logs…), but your actual data is perfectly safe, nothing else will be lost. Please revert !!

Shelton · ‎10-25-2019

@Anuj Here is the official steps from the Ambari.org read through and follow the steps look at my steps for checking the zookeeper entries Step-by-step guide Using Ambari Set AMS to maintenance Stop AMS from Ambari Identify the following from the AMS Configs screen 'Metrics Service operation mode' (embedded or distributed) hbase.rootdir hbase.zookeeper.property.dataDir AMS data would be stored in 'hbase.rootdir' identified above. Backup and remove the AMS data. If the Metrics Service operation mode is 'embedded', then the data is stored in OS files. Use regular OS commands to backup and remove the files in hbase.rootdir is 'distributed', then the data is stored in HDFS. Use 'hdfs dfs' commands to backup and remove the files in hbase.rootdir Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper Remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder Restart AMS using Ambari I take the above a step further by locating the zookeeper executable usually in /usr/hdp/{hdp_version}/zookeeper/bin/ Log into zookeeper [zookeeper@osaka bin]$ ./zkCli.sh List the root leaf structure you should see ambari-metrics-cluster should look like below [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notificat ion, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_b lock, config] Now check the entries under ambari-metrics-cluster, you should find something like below ls /ambari-metrics-cluster/INSTANCES/ FQDN_12001 Delete the entry that corresponds to your cluster [zk: localhost:2181(CONNECTED) 25] rmr /ambari-metrics-cluster/INSTANCES/FQDN_12001 Restart the AM this should recreate a new entry in zookeeper

Shelton · ‎10-25-2019

@Anuj Is this the first time you are starting the service? If not what happened in between was there a change in your configuration? Please revert

Shelton · ‎10-25-2019

@jepe_desu Good to know it worked out for you? Which solution was that? it's good if you could elaborate so other members could use it as a quick win and that will also give you points or just mark the post you referenced as a solution so the Cloudera community members can use a filter to get a quick solution🙂 Giving back to the community happy hadooping🙂

Shelton · ‎10-24-2019

@Atena-Dev-Team For sure when you kerberize your cluster your are hardening security access to all components Hive,hbase, Kafka etc . The problem you are encountering is related to Ranger because security has been toggled to Ranger after Kerberization,can you check your hive config like below show my hive authorization is now delegated to Ranger So will need to use Ranger to give access to hive databases and tables Can you check whether the hive plugin has been enabled? If thats the case then your authorization will have to be through Ranger Happy hadooping!

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: under-replicated blocks + why we get this warn...

Re: spark-shell --master yarn error in HDP 2.6.5

Re: under-replicated blocks + why we get this warn...

Re: under-replicated blocks + why we get this warn...

Re: Facing Kerbros' Authentication error while acc...

Re: Yarn Timeline Service V2 not starting

Re: not able to start my metrics collector

Re: not able to start my metrics collector

Re: Invalid KDC administrator credentials

Re: Facing Kerbros' Authentication error while acc...