Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 923 | 06-04-2025 11:36 PM | |
| 1525 | 03-23-2025 05:23 AM | |
| 756 | 03-17-2025 10:18 AM | |
| 2710 | 03-05-2025 01:34 PM | |
| 1801 | 03-03-2025 01:09 PM |
10-24-2018
11:49 AM
1 Kudo
@hardik desai Typically with a replication factor of 3, your fault data node can be removed with no impact as 2 other copies should be available. Just to understand your setup how many data nodes are we talking of here? Running the rebalancer will do the job but with 2 TB it should take a while depending on your data center bandwidth. Rebalancing HDFS HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. To initiate a balancing process, follow these steps: In Ambari Web, browse to Services > HDFS > Summary. Click Service Actions, and then click Rebalance HDFS. Enter the Balance Threshold value as a percentage of disk capacity. Click Start. It's recommended running the balancer during times when the cluster load is low else you'll notice high NN RPC, when balancer is executing. Here is a document that can help you tune the hdfs Balancer
... View more
10-23-2018
12:02 PM
@Sherrine Green Thompson Whats you HDP version? Can you check and share the logs on the NameNode host in /var/log/hadoop/hdfs ?
... View more
10-22-2018
08:04 AM
@Sherrine Green Thompson Looks like you have memory issues "-XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" Check the NameNode Java heap size can you adjust that and revert!
... View more
10-21-2018
02:18 PM
@Robert Levas @Alex Goron How about the functionality of managing remote cluster? Personally I haven't implemented that but you can register and manage a remote cluster!
... View more
10-19-2018
08:15 AM
@Alex Coast I think the Oracle SQL syntax expects a semicolon at the end eg SELECT * FROM my_db.my_table; Can you try that and revert
... View more
10-17-2018
11:16 PM
@Harry Li It's paramount that you first go through this check list to prepare your new host to join the cluster. This is to avoid some frustrating errors especially having nonworking nodes in a cluster This document with screenshots for adding a data node to an existing HDP cluster looks good because the process is straightforward provided you followed the prerequisites. HTH
... View more
10-13-2018
07:06 AM
@Amit Mishra I can understand your doubt. That's purely a marketing stunt.When Hortonworks packages an new major release like HDP 3.0 it contains underlying components with different versions like
Apache Accumulo 1.7.0 Apache Atlas 1.0.0 Apache Calcite 1.2.0 Apache DataFu 1.3.0 Apache Hadoop 3.1.0 Apache HBase 2.0.1 Apache Hive 3.0.0 Apache Kafka 1.0.1 Apache Knox 1.0.0 Apache Livy 0.5 Apache Oozie 4.3.1 Apache Phoenix 5.0.0 Apache Pig 0.16.0 Apache Ranger 1.1.0 Apache Spark 2.3.1 Apache Sqoop 1.4.7 Apache Storm 1.2.1 Apache TEZ 0.9.1 Apache Zeppelin 0.8.0 Apache ZooKeeper 3.4.6 As you can see above HDFS is version 3.1.0, so the HDP version will never have the same component versions. For example, at Apache Software Foundation the latest version Hadoop-3.1.1 before Hortonworks includes and releases its major release i.e HDP 3.2 version to include this version of Hadoop it has to go through rigorous compatibility tests with other components. See attached screenshot HTH
... View more
10-10-2018
05:02 PM
@Jeremy Jean-Jean I can see you have the -execute option, this is ONLY used when you want to run the pre-upgrade tool command in Ambari instead of on the Beeline command line which is the recommended method. The -execute option automatically executes the generated commands interactively, but if you what to run the scripts in beeline you need the command with the -location option where the scripts to be run in beeline will be generated. So simply said if you don't want to generate and run the scripts in beeline use the -execute option else use -location and re-run in beeline the generated scripts 🙂 HTH
... View more
10-10-2018
02:50 PM
@Jeremy Jean-Jean It's usually hive! Can you access the old 2.6.5 and invoke the beeline CLI # su - hive
$beeline If you get the prompt then that confirms the user hive !
... View more
10-08-2018
09:27 PM
1 Kudo
@Vinu HDFS "Data at Rest" Encryption Hadoop provides several ways to encrypt stored data. volume encryption Application level encryption HDFS data at rest encryption The last approach uses specially designated HDFS directories known as "encryption zones." simply a special HDFS directory within which all data is encrypted upon write, and decrypted upon read. You can have multiple encryption zones with this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strong as the table. HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. Cloud data access server-side encryption slightly slows down performance when reading data from S3, both in the reading of data during the execution of a query and in scanning the files prior to the actual scheduling of work. You can run two Hadoop performance tests, TestDFSIO and TeraSort, to measure performance in different encryption zones. TestDFSIO is more storage I/O- and throughput-focused, while TeraSort is representative of running a workload that is not only I/O- but also CPU-intensive. Both of these tests use the Hadoop distributed file system (HDFS). Ran these tests to compare encrypted data in different configurations but all also depends on your hardware eg Using E5-2699 v3 compared to Xeon E5-2697 v2 processors results in a significant increase in performance during test scenarios. Reference Data at rest encryption
... View more