About Shelton

Shelton · ‎10-26-2019

@mike_bronson7 Under replicated blocks There are a couple of potential source of the problem that triggers this alert! The HDP versions earlier than HDP 3.x all use the standard default 3 replication factor for reasons you know well , the ability to rebuild the data in whatever case as opposed to the new Erasure coding policies in Hadoop 3.0. Secondly, the cluster will rebalance itself if you gave it time 🙂 Having said that the first question is how many data nodes were set up in this new cluster and did you enable rack awareness? This usually means that some files are “asking” for a specific number of target replicas that are not present or not being able to get the replica. So the question is, how i know which files are asking for a number of replicas that are not available? The first option is use hdfs fsck: $ hdfs fsck / -storagepolicies ****** **************output ********************* Connecting to namenode via http://xxx.com:50070/fsck?ugi=hdfs&storagepolicies=1&path=%2F FSCK started by hdfs (auth:SIMPLE) from /192.168.0.94 for path / at Sat Oct 26 23:03:24 CEST 2019 /user/zeppelin/notebook/2EC24FF9U/note.json: Under replicated BP-2067995211-192.168.0.101-1537740712051:blk_1073751507_10767. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s). ****** Change the replication $ hdfs dfs -setrep -w 1 /user/zeppelin/notebook/2EC24FF9U/note.json Replication 1 set: /user/zeppelin/notebook/2EC24FF9U/note.json Waiting for /user/zeppelin/notebook/2EC24FF9U/note.json ... done You also need to check dfs.replication in hdfs-site.xml the default is configured to be 3. Note that it turns out that if you upload files through Ambari, the file actually gets the replication factor of 3. HTH

Shelton · ‎10-26-2019

@Atena-Dev-Team Any updates on this thread

Shelton · ‎10-26-2019

@JLo_Hernandez HDP3 uses HBase as a backend for the timeline service. You can either use an external HBase or have a system HBase running on Yarn (the default). With a server crash, your ATSv2 HBase Application data could be corrupted this can impact performance because of timeouts from the ATS To fix that follow: Check for any error in the yarn logs usually in /var/log/hadoop-yarn/yarn/ for anything clear to spot, for instance, not enough yarn memory (and then fix it if relevant) see these 2 files hadoop-yarn-timelineserver-<host_name>.out hadoop-yarn-timelineserver-<host_name>.log Clean up hdfs ATS data as described in Remove ats-HBase before switching between clusters note there are different steps for secure [Kerberized] and unsecure clusters Clean up zookeeper ATS data (the example here is for insecure clusters, you will probably have another znode for kerberised clusters): zookeeper-client rmr /atsv2-hbase-unsecure Log on zookeeper here I am on an secure HDP 3.1 single node cluster # /usr/hdp/3.1.0.0-78/zookeeper/bin/zkCli.sh [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notification, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_block, config] Go for the ATSv2 entry [zk: localhost:2181(CONNECTED) 1] ls /atsv2-hbase-unsecure [rs, splitWAL, backup-masters, table-lock, draining, master-maintenance, table] Delete the entry [zk: localhost:2181(CONNECTED) 2] rmr /atsv2-hbase-unsecure Restart *all* YARN services, Restart ambari server You lose your ATS history with the above hdfs & zookeeper cleaning steps like (ie. job names, timing, logs…), but your actual data is perfectly safe, nothing else will be lost. Please revert !!

Shelton · ‎10-25-2019

@Anuj Here is the official steps from the Ambari.org read through and follow the steps look at my steps for checking the zookeeper entries Step-by-step guide Using Ambari Set AMS to maintenance Stop AMS from Ambari Identify the following from the AMS Configs screen 'Metrics Service operation mode' (embedded or distributed) hbase.rootdir hbase.zookeeper.property.dataDir AMS data would be stored in 'hbase.rootdir' identified above. Backup and remove the AMS data. If the Metrics Service operation mode is 'embedded', then the data is stored in OS files. Use regular OS commands to backup and remove the files in hbase.rootdir is 'distributed', then the data is stored in HDFS. Use 'hdfs dfs' commands to backup and remove the files in hbase.rootdir Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper Remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder Restart AMS using Ambari I take the above a step further by locating the zookeeper executable usually in /usr/hdp/{hdp_version}/zookeeper/bin/ Log into zookeeper [zookeeper@osaka bin]$ ./zkCli.sh List the root leaf structure you should see ambari-metrics-cluster should look like below [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notificat ion, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_b lock, config] Now check the entries under ambari-metrics-cluster, you should find something like below ls /ambari-metrics-cluster/INSTANCES/ FQDN_12001 Delete the entry that corresponds to your cluster [zk: localhost:2181(CONNECTED) 25] rmr /ambari-metrics-cluster/INSTANCES/FQDN_12001 Restart the AM this should recreate a new entry in zookeeper

Shelton · ‎10-25-2019

@Anuj Is this the first time you are starting the service? If not what happened in between was there a change in your configuration? Please revert

Shelton · ‎10-25-2019

@jepe_desu Good to know it worked out for you? Which solution was that? it's good if you could elaborate so other members could use it as a quick win and that will also give you points or just mark the post you referenced as a solution so the Cloudera community members can use a filter to get a quick solution🙂 Giving back to the community happy hadooping🙂

Shelton · ‎10-24-2019

@Atena-Dev-Team For sure when you kerberize your cluster your are hardening security access to all components Hive,hbase, Kafka etc . The problem you are encountering is related to Ranger because security has been toggled to Ranger after Kerberization,can you check your hive config like below show my hive authorization is now delegated to Ranger So will need to use Ranger to give access to hive databases and tables Can you check whether the hive plugin has been enabled? If thats the case then your authorization will have to be through Ranger Happy hadooping!

Shelton · ‎10-23-2019

@soumya Good to hear that can you share what solution worked for you this way other who encounter the same problem can quickly resolve it. Thats what we csll community contribution Happy hadooping

Shelton · ‎10-22-2019

@Axe Just for testing purpose can you change the FW rules to allow all traffic from your IP?

Shelton · ‎10-22-2019

@Axe As reiterated by @ssulav your problem is emanating from a network access problem to this server http://ip-172-31-24-21.us-west-1.compute.internal Resolve that and revert

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: under-replicated blocks + why we get this warn...

Re: Facing Kerbros' Authentication error while acc...

Re: Yarn Timeline Service V2 not starting

Re: not able to start my metrics collector

Re: not able to start my metrics collector

Re: Invalid KDC administrator credentials

Re: Facing Kerbros' Authentication error while acc...

Re: unable to connect thrift server using beeline

Re: Not able to connect to Cloudera Manager Web UI

Re: Not able to connect to Cloudera Manager Web UI