About Shelton

Shelton · ‎06-08-2018

Anpan K I can understand the confusion that's brewing in your mind. In a kerberized production cluster, you'd usually have Edge node, Master and slave nodes. I will not go in the description and placement of every single component but the below distribution gives you a picture. Note: The worker node usually MUST have at least 2 slave processes Datanode & NodeManager, all the client software goes on the Edge node and the Master node holds the other components notably NameNode,RM,Zookeeper HA's etc Master (3x) Namenode YARN (RM) Zookeeper HS2 ..... ..... Hbase Master Slave(worker nodes) n DataNodes NodeManager Region Servers Edge Node Knox ZK Client HDFS Client MR Cllient ... ... YARN client Below on the knowGateway is installed all the client software, the Hadoop services here represent the Master and Slaves nodes. The Knox gateway should sit on the Edge node as should be the only access to the cluster as illustrated above. HTH

Shelton · ‎06-08-2018

@Michael Bronson AFAIK that should work as Ambari manages all the cluster config, but as usual, you will have to validate always setting global params 🙂

Shelton · ‎06-07-2018

@Michael Bronson That looks good. Caution the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. export JAVA_OPTS="-Djute.maxbuffer=11000000" ZooKeeper you will run into issues as soon as one of your clients (eg Solr) wants to do something with that file. Thus, you need to set jute.maxbuffer for your clients as well.

Shelton · ‎06-07-2018

@Michael Bronson Zookeeper is not designed as a large data store to hold very large data values. As such this 1MB value is a default config option and can be overridden. It is NOT advised to do so but increasing the size a little bit will probably not damage your system it all depends on your unique access patterns and these changes should be made with care and at your own risk! The parameter to change is CAUTION as reiterated above -Djute.maxbuffer=<bytes> Please revert HTH

Shelton · ‎06-06-2018

@Samant Thakur When a Hadoop framework creates a new block, it places the first replica on the local node. And place the second one in a different rack, and the third one is on a different node on the local node. During block replicating, if the number of existing replicas is one, place the second on a different rack. When the number of existing replicas are two, if the two replicas are in the same rack, place the third one on a different rack. The main purpose of Rack awareness is to: Improve data reliability and data availability. Better cluster performance. Prevents data loss if the entire rack fails. To improve network bandwidth. Keep the bulk flow in-rack when possible. If your production and this problematic cluster have the same Ambari/HDP version then, you can't call it a bug but client specific problem. I would still insist you enable rack awareness and monitor over 24hr to see the change in the alerts. Have you tried running a cluster balancing utility? $ hadoop balancer HTH

Shelton · ‎06-06-2018

@karthik nedunchezhiyan This potentially happens JournalNodes when one of the nodes is lagging behind the others (eg because its local disk is slower or there was a network blip), it receives edits after they've been committed to a majority. It can tell this because the committed txid included in the request info is higher than the highest txid in the actual batch to be written. In this case, we know that this batch has already been fsynced to a quorum of nodes, so we can skip the fsync() on the laggy node, helping it to catch back up. The Active NameNode will write/read edits to the below URI, which is a shared address by the JournalNodes and provides the shared edits storage, it's ONLY written to by the Active nameNode and read by the Standby NameNode to stay up-to-date with all the file system changes the Active NameNode makes. Though you must specify several JournalNode addresses, you should only configure one of these URIs. dfs.namenode.shared.edits.dir QuorumJournalManager is responsible for syncing the missing transactions On a journal node, the missing transaction is recovered by the TransferFsImage class from another journal node thats up to date in this case journalnodes (1 and 3) Started a 3-node QJM cluster strace -efdatasync,write -f <pid of one JN> Write some txns to the NN it will show a lot of fdatasync and write calls. kill -STOPped that JN for 10-15 seconds kill -CONT that JN You will see a bunch of write() with no fdatasync calls while it was still catching up. After it caught up, it started syncing again.

Shelton · ‎06-06-2018

@Kant T Have you tried using after changing the webapps directory user: group to zeppelin:hadoop and prevented Ambari from changing by using # chattr -R +i /usr/hdp/2.6.3.x/zeppelin/webapps Restart Zeppelin and relogin

Shelton · ‎06-06-2018

@Praveen Atmakuri I want to ensure the document in the link above will move your on-premise MySQL to MySQL in Azure.Azure Database for MySQL is a Microsoft cloud based service based on the MySQL Community Edition database engine. Advantages Built-in high availability with no additional cost. Predictable performance, using inclusive pay-as-you-go pricing. Scale as needed within seconds. Secured to protect sensitive data-at-rest and in motion. Automatic backups and point-in-time-restore for up to 35 days. Enterprise-grade security and compliance. Difference between Azure MSSQL and Azure MySQL DB MySQL is open source while MSSQL is licensed(commercial). MySQL supports more programming languages than MSSQL MySQL supported several Platforms Windows, Linux and Mac OS while MSSQL runs exclusively on windows though Microsoft recently announced MSSQLwill be available on Linux. MySQL supports a number of storage engines. While using MySQL even have option to use a plug-in storage engine. MySQL does not allow users to kill or cancel a query when it is running But SQL Server programmers can truncate a database query during execution without killing the entire process. These are just but a few distinct differences between the 2 RDBMS's If you found this answer addressed your question, please take a moment to log in and click the "accept" link on the answer.

Shelton · ‎06-06-2018

@Sriram You won't have DNS issues and please don't hesitate if you encounter ay problem. We have a great forum and very savvy members. If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

Shelton · ‎06-06-2018

@Sriram Extract for the Hortonworks documentation "All hosts in your system must be configured for DNS and Reverse DNS." If you are unable to configure DNS and Reverse DNS, you must edit the hosts' file on every host in your cluster to contain the address of each of your hosts and to set the Fully Qualified Domain Name hostname of each of those hosts. So that clearly answers your worries 🙂 HTH

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Know and Edgenode configuration

Re: how to increase the ZNode size in ambari clus...

Re: how to increase the ZNode size in ambari clus...

Re: how to increase the ZNode size in ambari clus...

Re: Data Nodes displaying incorrect block report

Re: How journalnode keeps itself upto date?

Re: zeppelin Web page - HTTP ERROR: 503

Re: Migration from mysql to Azure mssql

Re: FQDN is not set for hosts.. Any issues in prod...

Re: FQDN is not set for hosts.. Any issues in prod...