About Shelton

Shelton · ‎10-30-2019

@mike_bronson7 Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited. It is important not to allow a running container to consume too much of the host machine’s memory As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc As I reiterated that a good move get you hands dirty 🙂

Shelton · ‎10-30-2019

@moubaba I know its a dompting task but unfortunately, I could create a link by an article for Jordan Moore let me try to elaborate. Simple advice always run a crontab job to dump all your databases or at leat the ambari DB nightly ! Firstly Ambari doesn't control any data residing on the data nodes, so you should be safe there. Stop all ambari-agents in the cluster, maybe even uninstalling it so as to let all the Hadoop components remain running "in the dark". Install and set up a new Ambari server, add a cluster but do not register any hosts [very important]. If you uninstalled the ambari -agents on the nodes please do re-install ensure the ambari-server and the agents are the same versions. Do the below if you removed Ambari and the agents # yum repolist On the ambari-server # yum install ambari-server # yum install ambari-agent On the other hosts # yum install ambari-agent Reconfigure manually the ambari agents [ambari-agent.ini] to point at the new Ambari server address, and start them. Add the hosts in the Ambari server UI, selecting "manual registration" option the hosts register successfully since the agents are running a point to note install and configure an ambari-agent on the Ambari server to point to itself !!!! After this, you get the option of installing clients and servers. Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the data node column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything. At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure. To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember all the Services, Components, and Host Components then go to each host and do a process check to see what's running. Variables export AMBARI_USER=admin export AMBARI_PASSWD=admin export CLUSTER_NAME=<New_Cluster_name> export AMBARI_SERVER=<Your_new_ambari_server_FQDN> export AMBARI_PORT=8080 List all services related to hive In the below example I am listing all hive components curl -u $AMBARI_USER:$AMBARI_PASSWD -H 'X-Requested-By: ambari' -X GET "http://$AMBARI_SERVER:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/services/HIVE" Adding back pig curl -k -u $AMBARI_USER:$AMBARI_PASSWD -H "X-Requested-By:ambari" -i -X POST -d '{"RequestInfo":{"context":"Install PIG"}, "Body":{"HostRoles":{"state":"INSTALLED"}}}' 'http://'$AMBARI_SERVER':$AMBARI_PORT/api/v1/clusters/'$CLUSTER_NAME'/services/PIG/components/PIG' This is the time to get your hands dirty 🙂 This isn't production so it should train you for a real situation Happy hadooping !!

Shelton · ‎10-29-2019

@Elephanta Okay, now with that information I get a better understanding and picture. By default, HDP 2.6 has a replication factor of 3 so it's looking to place the other 2 copies on different data nodes that the existing unless you create new files with a replication factor of 1 you will continue to get the unreplicated block errors 🙂 but now that you know it's manageable. Maybe next time you delete files in hdfs use the -skipTrash option hdfs dfs -rm -skipTrash /path/to/hdfs/file/to/remove/permanently or emtpyting the existing .Trash Options: Change the replication factor of a file at creatio: hdfs dfs –setrep –w 1 /user/hdfs/file.txt Or change the replication factor of a directory hdfs dfs -setrep -R 1 /user/hdfs/your_dir Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor in dfs.replication from hdfs-site.xml maybe in your case this is what you should change to 1 as this takes effect cluster-wide for your Dev environment Happy hadooping

Shelton · ‎10-29-2019

@Elephanta Any updates? Why the high number of block deletes are those under replicated blocks? There is many parameters to evaluate like the number of data nodes, replication factor, and rack awareness! I would be happy to get your feedback

Shelton · ‎10-29-2019

@mike_bronson7 Your plans are doable and that's the way many companies have deployed their Kafka production clusters if you intend ONLY to use Kafka, but you could take it a step further by enabling HA and reliability but orchestrating all that with Kubernetes with PVC's it's a great idea. Running kafka as a microservices on Kubernetes has become the norm and the path of least resistance. It is very difficult to allocate physical machines with local disks for Kafka companies running on VMs have found deploying Kafka outside of Kubernetes causes significant organizational headache. Running Kafka on Kubernetes gets your environment allocated faster and you can use your time to do productive work rather than fire fighting. Kafka management becomes much easier on kubernetest becomes easier to scaleup adding new brokers is a single command or a single line in a configuration file. And it is easier to perform configuration changes, upgrades and restarts on all brokers and all clusters. Kafka is a stateful service, and this does make the Kubernetes configuration more complex than it is for stateless microservices. The biggest challenge is configuring storage and network, and you’ll want to make sure both subsystems deliver consistent low latency that where PVC's [Persistent Volume claims] come in use of shared storage. The beauty is Kafka will run like a POD and you can configure a fixed number that MUST be running at any time and scale when needed with a single Kubectl or HELM command is elasticity at play !! Kafka also poses a challenge most stateful services don’t Brokers are not interchangeable, and clients will need to communicate directly with the broker that contains the lead replica of each partition they produce to or consume from. You can’t place all brokers behind a single load balancer address you must devise a way to route messages to a specific broker here is a good reading Recommendations for Deploying Apache Kafka on Kubernetes paper Happy hadooping

Shelton · ‎10-28-2019

@Elephanta The name node start is usually faster than what you are experiencing frm the logs I am seeing maybe the root cause. During startup the namenode read the fsimage which is the last best cluster images usually a combination of fsimage and the last edits log. You will need to combine the current fsimage and the edits log in the below steps. Name node is in safe mode. When the name node is in this state it's a security future that disables any change that the namenode metadata that it can't register,/record to note all the cluster wise location, state, permissions,ownership etc are stored in the metadata which is stored in the name node. You will need to do the following steps as root user while the cluster is freezing during startup. # su - hdfs Get the current state $ hdfs hdfsadmin -safemode get That will confirm that the namenode is in safe mode, you will need to force the creation of a hdfs image point in time by saving the namespace to create a new fsimage $ hdfs dfsadmin -saveNamespace confirm safemode is off $ hdfs hdfsadmin -safemode get This time the output should be Off leave safe mode and next time the startup should be much faster $ hdfs hdfsadmin -safemode get You might need to tune the memory allocated to the name node it seems the numebr of files to manage has increased hence the need to reconfigure the memory have a look at the below link for guidance.. Configuring NameNode Heap Size to estimate the memory required for the namenode

Shelton · ‎10-28-2019

@erkansirin78 That's exactly the output I was getting on my single node cluster, not Sandbox but I didn't know what you exactly wanted. When you start getting errors then you can ping me!

Shelton · ‎10-28-2019

@MIkeL The best technical reference before you embark on deploying your cluster is to check the compatibility of the different moving parts of HDP/Cloudera binaries against an operating system of your choice, the first source of truth is please filter all the possible valid options using supportmatrix cloudera/hortonworks tool Hortonworks and Cloudera do run exhaustive tests on a particular Operating system before certifying it as production-ready and from the about RHEL/Centos 7.7 are not yet certified so I highly doubt whether RHEL/Centos 8 is certified that explains the Python errors you are encountering. HTH

Shelton · ‎10-28-2019

@Mnju Ranger and Sentry don't offer data quality but a centralized security framework to manage fine-grained access control and policies across the cluster. Security administrators use it to easily manage policies for access to files, folders, databases, tables, or columns. These policies can be set for individual users or groups and then enforced consistently across the Cluster. The latest version of Ranger that ships with CDP now available for AWS and later this year for Azure manage access and authorization to the below resources using ranger plugins. HDFS Hive Ozone Atlas Nifi-Registry Storm HBase Knox Kafka YARN NiFi Solr Sentry is a granular, role-based authorization module for Hadoop and provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. It works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data) and allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources Both are security tools built for Hadoop that are usually combined with Kerberos, KMS, and TLS to provide a robust security framework. Data quality is a broad subject of discussion but to my knowledge, no tool manages data quality unless you are talking of catalog tools like Alation, waterline. Atlas is a tool that provides metadata management, data lineage and governance capabilities to build a catalog of data assets, classification, and governance across these assets. HTH

Shelton · ‎10-27-2019

@erkansirin78 Extract from spark.apache.org.In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work. You can set which master the context connects to using the --master the argument, and you can add JARs to the classpath by passing a comma-separated list to the --jars argument. I am not a spark expert but trying to understand

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Kafka using Docker for production clusters

Re: Ambari Mysql Database lost

Re: How to troubleshoot slow NameNode start up?

Re: How to troubleshoot slow NameNode start up?

Re: Kafka using Docker for production clusters

Re: How to troubleshoot slow NameNode start up?

Re: spark-shell --master yarn error in HDP 2.6.5

Re: Centos8

Re: Apache Ranger or Apache Sentry?

Re: spark-shell --master yarn error in HDP 2.6.5