About Shelton

Shelton · ‎10-24-2018

@hardik desai Typically with a replication factor of 3, your fault data node can be removed with no impact as 2 other copies should be available. Just to understand your setup how many data nodes are we talking of here? Running the rebalancer will do the job but with 2 TB it should take a while depending on your data center bandwidth. Rebalancing HDFS HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. To initiate a balancing process, follow these steps: In Ambari Web, browse to Services > HDFS > Summary. Click Service Actions, and then click Rebalance HDFS. Enter the Balance Threshold value as a percentage of disk capacity. Click Start. It's recommended running the balancer during times when the cluster load is low else you'll notice high NN RPC, when balancer is executing. Here is a document that can help you tune the hdfs Balancer

Shelton · ‎10-23-2018

@Sherrine Green Thompson Whats you HDP version? Can you check and share the logs on the NameNode host in /var/log/hadoop/hdfs ?

Shelton · ‎10-22-2018

@Sherrine Green Thompson Looks like you have memory issues "-XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" Check the NameNode Java heap size can you adjust that and revert!

Shelton · ‎10-21-2018

@Robert Levas @Alex Goron How about the functionality of managing remote cluster? Personally I haven't implemented that but you can register and manage a remote cluster!

Shelton · ‎10-19-2018

@Alex Coast I think the Oracle SQL syntax expects a semicolon at the end eg SELECT * FROM my_db.my_table; Can you try that and revert

Shelton · ‎10-17-2018

@Harry Li It's paramount that you first go through this check list to prepare your new host to join the cluster. This is to avoid some frustrating errors especially having nonworking nodes in a cluster This document with screenshots for adding a data node to an existing HDP cluster looks good because the process is straightforward provided you followed the prerequisites. HTH

Shelton · ‎10-13-2018

@Amit Mishra I can understand your doubt. That's purely a marketing stunt.When Hortonworks packages an new major release like HDP 3.0 it contains underlying components with different versions like Apache Accumulo 1.7.0 Apache Atlas 1.0.0 Apache Calcite 1.2.0 Apache DataFu 1.3.0 Apache Hadoop 3.1.0 Apache HBase 2.0.1 Apache Hive 3.0.0 Apache Kafka 1.0.1 Apache Knox 1.0.0 Apache Livy 0.5 Apache Oozie 4.3.1 Apache Phoenix 5.0.0 Apache Pig 0.16.0 Apache Ranger 1.1.0 Apache Spark 2.3.1 Apache Sqoop 1.4.7 Apache Storm 1.2.1 Apache TEZ 0.9.1 Apache Zeppelin 0.8.0 Apache ZooKeeper 3.4.6 As you can see above HDFS is version 3.1.0, so the HDP version will never have the same component versions. For example, at Apache Software Foundation the latest version Hadoop-3.1.1 before Hortonworks includes and releases its major release i.e HDP 3.2 version to include this version of Hadoop it has to go through rigorous compatibility tests with other components. See attached screenshot HTH

Shelton · ‎10-10-2018

@Jeremy Jean-Jean I can see you have the -execute option, this is ONLY used when you want to run the pre-upgrade tool command in Ambari instead of on the Beeline command line which is the recommended method. The -execute option automatically executes the generated commands interactively, but if you what to run the scripts in beeline you need the command with the -location option where the scripts to be run in beeline will be generated. So simply said if you don't want to generate and run the scripts in beeline use the -execute option else use -location and re-run in beeline the generated scripts 🙂 HTH

Shelton · ‎10-10-2018

@Jeremy Jean-Jean It's usually hive! Can you access the old 2.6.5 and invoke the beeline CLI # su - hive $beeline If you get the prompt then that confirms the user hive !

Shelton · ‎10-08-2018

@Vinu HDFS "Data at Rest" Encryption Hadoop provides several ways to encrypt stored data. volume encryption Application level encryption HDFS data at rest encryption The last approach uses specially designated HDFS directories known as "encryption zones." simply a special HDFS directory within which all data is encrypted upon write, and decrypted upon read. You can have multiple encryption zones with this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strong as the table. HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. Cloud data access server-side encryption slightly slows down performance when reading data from S3, both in the reading of data during the execution of a query and in scanning the files prior to the actual scheduling of work. You can run two Hadoop performance tests, TestDFSIO and TeraSort, to measure performance in different encryption zones. TestDFSIO is more storage I/O- and throughput-focused, while TeraSort is representative of running a workload that is not only I/O- but also CPU-intensive. Both of these tests use the Hadoop distributed file system (HDFS). Ran these tests to compare encrypted data in different configurations but all also depends on your hardware eg Using E5-2699 v3 compared to Xeon E5-2697 v2 processors results in a significant increase in performance during test scenarios. Reference Data at rest encryption

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: after shutting down data node permanently , wi...

Re: Unable to start Namenode - New Installation

Re: Unable to start Namenode - New Installation

Re: Multiple clusters on Ambari

Re: NiFi + Oracle: ORA-00933: SQL command not prop...

Re: add new data node to existing cluster

Re: Why HDP 3.0 showing HDFS version 3.1

Re: Hive upgrade HDP3 and Beeline

Re: Hive upgrade HDP3 and Beeline

Re: HDFS Data at Rest Encryption with Ranger KMS -...