Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 951 | 06-04-2025 11:36 PM | |
| 1552 | 03-23-2025 05:23 AM | |
| 776 | 03-17-2025 10:18 AM | |
| 2787 | 03-05-2025 01:34 PM | |
| 1838 | 03-03-2025 01:09 PM |
03-21-2019
10:02 PM
@Lorenc Hysi Looks like you back end database could be the cause. #################################################
# Create Schema Registry and SAM metastore:
##################################################
mysql -u root -p{root_password} create database registry;
create database streamline; CREATE USER 'registry'@'%' IDENTIFIED BY 'registry';
CREATE USER 'streamline'@'%' IDENTIFIED BY 'streamline';
GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ;
GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;
commit;
exit;
#################################################
# Test connection
##################################################
mysql -u streamline -pstreamline;
mysql -u registry -pregistry; The above should succeed It doesn't cost you a thing rerun the below to redeploy the jdbc jars ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
... View more
03-10-2019
08:36 AM
@ hoda moradi Any updates
... View more
03-04-2019
02:20 PM
1 Kudo
@harish Prerequisite: MUST DO If you want to copy a hive table across to another REALM you need to setup cross-realm trust between two MIT KDC's.This will enable the destination REALM user to have a valid Kerberos ticket to run operations on the source Cluster. Having said that, you should forget to revise your ranger policies to reflect the new REALM access privilege if the Ranger plugin has been enabled in the source cluster which I assume is the case to leverage the Ranger authorization. Here is a link to an HCC document that could help you set up the REALM trust PROCEDURE Follow the steps below to migrate a Hive database from one cluster to another: 1. Install Hive on the new cluster and make sure both the source and destination clusters are identical. 2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster. hadoop distcp <src> <dst> 3. Take a backup of the Hive Metastore. mysqldump hive > /tmp/mydir/backup_hive.sql 4. Install MySQL on the new Hadoop cluster. 5. Open the Hive MySQL-Metastore dump file and replace the source NameNode hostname with the destination hostname. hdfs://ip-address-old-namenode:port ---> hdfs://ip-address-new-namenode:port 6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster. mysql hive < /tmp/mydir/backup_hive.sql 7. Configure Hive as normal and perform the Hive schema upgrade if needed Impact Hive metadata contains the information about the database objects, and the contents are stored in the Hadoop Distributed File System (HDFS). Metadata contains HDFS URI and other details. Therefore, if you migrate Hive from one cluster to another cluster, you have to point the metadata to the HDFS of the new cluster. If you don't do this, it will point to the HDFS of the older cluster and the migration will fail. In case of any failure, initialize the Hive Metastore of the destination cluster and resume the migration following the correct steps. /bin/schematool -initSchema -dbType mysql On CDH If you are on Cloudera then you can proceed using Backup and Disaster recovery procedure HTH
... View more
02-24-2019
06:07 PM
@Balaji Vemula After changing dfs.permissions.superusergroup in hdfs-site.xml, it would require a NameNode restart for the change to take effect. If this cluster uses NameNode HA with QuorumJournalManager, then both NameNodes need to be restarted. Run "hdfs groups <username>", where <username> is the user that you have added to the group that you want to be the HDFS supergroup. This command will print out a list of that user's group memberships, as perceived by the NameNode. If the list does not show your configured supergroup, then this indicates there is some kind of misconfiguration From the root CLI switch user to user balu # su - balu Now at the prompt you should be able to run hdfs command as user balu $ hdfs dfsadmin -printTopology That should work even on your cloudera
... View more
02-24-2019
02:12 AM
@Shraddha Singh Any updates did my response resolve the issue if so accept so the thread is marked as closed.
... View more
02-13-2019
09:13 PM
@Sampath Kumar Cheers
... View more
02-12-2019
06:57 PM
@Madhura Mhatre Can you share the output of the rebalancer? Have you tried reducing gradually from 25% - 20% -15% -10% to see if you are going some space Depending on the purpose of the cluster, you shouldn't attempt this on production. You can remove the directory through the Ambari property: dfs.datanode.data.dir and do a rolling restart of data nodes. Make sure you don't have any missing/under-replicated blocks before restarting the new data node. umount the /data1 format it remount
... View more
02-12-2019
03:14 PM
@Michael Bronson Just create the home directory as follows # su - hdfs
$ hdfs dfs -mkdir /user/slider
$ hdfs dfs -chown slider:hdfs /user/slider That should be enough .. good luck
... View more
02-12-2019
07:02 AM
@Priyanka This is a closed thread (2017) can you open a new one and copy past this content.
... View more
02-12-2019
03:09 AM
@Madhura Mhatre HDFS balancer doesn't run at background, has to run manually. A threshold parameter is a float number between 0 and 100. From the average cluster utilization, the balancer process will try to converge all data node's usage in the range [average - threshold, average + threshold]. The default threshold is 10% For example, if the cluster current utilization is 50% full, then higher usage data nodes will start to move data to lower usage nodes. - Higher (average + threshold): 60%
- Lower (average - threshold): 40% TIP: If you haven't balance your cluster for a long time, you should start by balancing with a higher threshold like 25, and then converging to a smaller target threshold like 10. The smaller your threshold, the more balanced your data nodes will be. For very small threshold ie 2, the cluster may not be able to reach the balanced state if other clients concurrently write and delete data in the cluster. So in your scenario, I would advise a threshold of 25 $ hdfs balancer -threshold 25 First, the balancer will pick data nodes with current usage above the higher threshold, and try to find blocks from these data nodes that could be copied into nodes with current usage below the lower threshold Second, the balancer will select over-utilized nodes to move blocks to nodes with utilization below average. Third, the balancer will pick nodes with utilization above average to move data to under-utilized nodes. In addition to that selection process, the balancer can also pick a proxy node if the source and the destination are not located in the same rack (i.e. a data node storing a replica of the block and located in the same rack than the destination). Yes, the balancer is rack aware and will generate very little rack-to-rack noise. HTH
... View more