About Shelton

Shelton · ‎10-30-2019

@moubaba I know its a dompting task but unfortunately, I could create a link by an article for Jordan Moore let me try to elaborate. Simple advice always run a crontab job to dump all your databases or at leat the ambari DB nightly ! Firstly Ambari doesn't control any data residing on the data nodes, so you should be safe there. Stop all ambari-agents in the cluster, maybe even uninstalling it so as to let all the Hadoop components remain running "in the dark". Install and set up a new Ambari server, add a cluster but do not register any hosts [very important]. If you uninstalled the ambari -agents on the nodes please do re-install ensure the ambari-server and the agents are the same versions. Do the below if you removed Ambari and the agents # yum repolist On the ambari-server # yum install ambari-server # yum install ambari-agent On the other hosts # yum install ambari-agent Reconfigure manually the ambari agents [ambari-agent.ini] to point at the new Ambari server address, and start them. Add the hosts in the Ambari server UI, selecting "manual registration" option the hosts register successfully since the agents are running a point to note install and configure an ambari-agent on the Ambari server to point to itself !!!! After this, you get the option of installing clients and servers. Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the data node column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything. At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure. To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember all the Services, Components, and Host Components then go to each host and do a process check to see what's running. Variables export AMBARI_USER=admin export AMBARI_PASSWD=admin export CLUSTER_NAME=<New_Cluster_name> export AMBARI_SERVER=<Your_new_ambari_server_FQDN> export AMBARI_PORT=8080 List all services related to hive In the below example I am listing all hive components curl -u $AMBARI_USER:$AMBARI_PASSWD -H 'X-Requested-By: ambari' -X GET "http://$AMBARI_SERVER:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/services/HIVE" Adding back pig curl -k -u $AMBARI_USER:$AMBARI_PASSWD -H "X-Requested-By:ambari" -i -X POST -d '{"RequestInfo":{"context":"Install PIG"}, "Body":{"HostRoles":{"state":"INSTALLED"}}}' 'http://'$AMBARI_SERVER':$AMBARI_PORT/api/v1/clusters/'$CLUSTER_NAME'/services/PIG/components/PIG' This is the time to get your hands dirty 🙂 This isn't production so it should train you for a real situation Happy hadooping !!

Shelton · ‎10-29-2019

@mike_bronson7 Your plans are doable and that's the way many companies have deployed their Kafka production clusters if you intend ONLY to use Kafka, but you could take it a step further by enabling HA and reliability but orchestrating all that with Kubernetes with PVC's it's a great idea. Running kafka as a microservices on Kubernetes has become the norm and the path of least resistance. It is very difficult to allocate physical machines with local disks for Kafka companies running on VMs have found deploying Kafka outside of Kubernetes causes significant organizational headache. Running Kafka on Kubernetes gets your environment allocated faster and you can use your time to do productive work rather than fire fighting. Kafka management becomes much easier on kubernetest becomes easier to scaleup adding new brokers is a single command or a single line in a configuration file. And it is easier to perform configuration changes, upgrades and restarts on all brokers and all clusters. Kafka is a stateful service, and this does make the Kubernetes configuration more complex than it is for stateless microservices. The biggest challenge is configuring storage and network, and you’ll want to make sure both subsystems deliver consistent low latency that where PVC's [Persistent Volume claims] come in use of shared storage. The beauty is Kafka will run like a POD and you can configure a fixed number that MUST be running at any time and scale when needed with a single Kubectl or HELM command is elasticity at play !! Kafka also poses a challenge most stateful services don’t Brokers are not interchangeable, and clients will need to communicate directly with the broker that contains the lead replica of each partition they produce to or consume from. You can’t place all brokers behind a single load balancer address you must devise a way to route messages to a specific broker here is a good reading Recommendations for Deploying Apache Kafka on Kubernetes paper Happy hadooping

Shelton · ‎10-28-2019

@MIkeL The best technical reference before you embark on deploying your cluster is to check the compatibility of the different moving parts of HDP/Cloudera binaries against an operating system of your choice, the first source of truth is please filter all the possible valid options using supportmatrix cloudera/hortonworks tool Hortonworks and Cloudera do run exhaustive tests on a particular Operating system before certifying it as production-ready and from the about RHEL/Centos 7.7 are not yet certified so I highly doubt whether RHEL/Centos 8 is certified that explains the Python errors you are encountering. HTH

Shelton · ‎10-28-2019

@Mnju Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster. The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins. HDFS Hive Ozone Atlas Nifi-Registry Storm HBase Knox Kafka YARN NiFi Solr Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework. Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets. HTH

Shelton · ‎10-27-2019

@mike_bronson7 Surely you can use that hdfs fsck / -delete but remember it will be put in the trash !!!

Shelton · ‎10-27-2019

@mike_bronson7 Regarding under replicated blocks, HDFS is supposed to recover them automatically (by creating missing copies to fulfill the replication factor) but in your case, your cluster-wide replication factor is 3 but the target is 10 It's suggesting have 5 data nodes while there are 10 replicas leading to the under replication alert! According to the output you have 2 distinct problems (a) Under replicated blocks, Target Replicas is 10 but found 5 live replica(s) [Last 2 lines] (b) Corrupt blocks with 2 different solutions Solution 1 under replicated You could force the 2 blk to align with cluster-wide replication factor by adjusting using -setrep $ hdfs dfs -setrep -w 3 [File_name] Validate by Now you should see 3 after the file permissions before the user:group like below $ hdfs dfs -ls [File_name] -rw-r--r-- 3 analyst hdfs 1068028 2019-10-27 12:30 /flighdata/airports.dat And wait for the deletion to happen or run the below snippets sequentially $ hdfs fsck / | grep 'Under replicated' $ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files $ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done For Corrupt files $ hdfs fsck / | egrep -v '^\.+$' | grep -i corrupt ...............Example output............................ /user/analyst/test9: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1055741378 /user/analyst/data1: CORRUPT blockpool BP-762603225-192.168.1.2-1480061879099 block blk_1056741378 /user/analyst/data2: MISSING 3 blocks of total size 338192920 B.Status: CORRUPT CORRUPT FILES: 9 CORRUPT BLOCKS: 18 Corrupt blocks: 18 The filesystem under path '/' is CORRUPT Locate corrupted block $ hdfs fsck / | egrep -v '^\.+$' | grep -i "corrupt blockpool"| awk '{print $1}' |sort |uniq |sed -e 's/://g' >corrupted.flst Get the location in the above output corrupted.flst $ hdfs fsck /user/analyst/xxxx -locations -blocks -files Remove the corrupted files hdfs dfs -rm /path/to/corrupted.flst Skip the trash to permanently delete $ hdfs dfs -rm -skipTrash /path/to/corrupt_filename. You should give the cluster sometime to rebalance in the case of under-replicated files.

Shelton · ‎10-26-2019

@mike_bronson7 Under replicated blocks There are a couple of potential source of the problem that triggers this alert! The HDP versions earlier than HDP 3.x all use the standard default 3 replication factor for reasons you know well , the ability to rebuild the data in whatever case as opposed to the new Erasure coding policies in Hadoop 3.0. Secondly, the cluster will rebalance itself if you gave it time 🙂 Having said that the first question is how many data nodes were set up in this new cluster and did you enable rack awareness? This usually means that some files are “asking” for a specific number of target replicas that are not present or not being able to get the replica. So the question is, how i know which files are asking for a number of replicas that are not available? The first option is use hdfs fsck: $ hdfs fsck / -storagepolicies ****** **************output ********************* Connecting to namenode via http://xxx.com:50070/fsck?ugi=hdfs&storagepolicies=1&path=%2F FSCK started by hdfs (auth:SIMPLE) from /192.168.0.94 for path / at Sat Oct 26 23:03:24 CEST 2019 /user/zeppelin/notebook/2EC24FF9U/note.json: Under replicated BP-2067995211-192.168.0.101-1537740712051:blk_1073751507_10767. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s). ****** Change the replication $ hdfs dfs -setrep -w 1 /user/zeppelin/notebook/2EC24FF9U/note.json Replication 1 set: /user/zeppelin/notebook/2EC24FF9U/note.json Waiting for /user/zeppelin/notebook/2EC24FF9U/note.json ... done You also need to check dfs.replication in hdfs-site.xml the default is configured to be 3. Note that it turns out that if you upload files through Ambari, the file actually gets the replication factor of 3. HTH

Shelton · ‎10-25-2019

@Anuj Here is the official steps from the Ambari.org read through and follow the steps look at my steps for checking the zookeeper entries Step-by-step guide Using Ambari Set AMS to maintenance Stop AMS from Ambari Identify the following from the AMS Configs screen 'Metrics Service operation mode' (embedded or distributed) hbase.rootdir hbase.zookeeper.property.dataDir AMS data would be stored in 'hbase.rootdir' identified above. Backup and remove the AMS data. If the Metrics Service operation mode is 'embedded', then the data is stored in OS files. Use regular OS commands to backup and remove the files in hbase.rootdir is 'distributed', then the data is stored in HDFS. Use 'hdfs dfs' commands to backup and remove the files in hbase.rootdir Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper Remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder Restart AMS using Ambari I take the above a step further by locating the zookeeper executable usually in /usr/hdp/{hdp_version}/zookeeper/bin/ Log into zookeeper [zookeeper@osaka bin]$ ./zkCli.sh List the root leaf structure you should see ambari-metrics-cluster should look like below [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notificat ion, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_b lock, config] Now check the entries under ambari-metrics-cluster, you should find something like below ls /ambari-metrics-cluster/INSTANCES/ FQDN_12001 Delete the entry that corresponds to your cluster [zk: localhost:2181(CONNECTED) 25] rmr /ambari-metrics-cluster/INSTANCES/FQDN_12001 Restart the AM this should recreate a new entry in zookeeper

Shelton · ‎10-25-2019

@Anuj Is this the first time you are starting the service? If not what happened in between was there a change in your configuration? Please revert

Shelton · ‎10-23-2019

@soumya Good to hear that can you share what solution worked for you this way other who encounter the same problem can quickly resolve it. Thats what we csll community contribution Happy hadooping

Online	Offline
Last Visited	‎05-14-2026 06:01 AM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎05-14-2026 06:01 AM
Posts	3,681
Kudos received	622

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Ambari Mysql Database lost

Re: Kafka using Docker for production clusters

Re: Centos8

Re: Apache Ranger or Apache Sentry?

Re: under-replicated blocks + why we get this warn...

Re: under-replicated blocks + why we get this warn...

Re: under-replicated blocks + why we get this warn...

Re: not able to start my metrics collector

Re: not able to start my metrics collector

Re: unable to connect thrift server using beeline