About mugdha

mugdha · ‎09-20-2021

To demonstrate what is happening here, see these steps and output. Create a database at a custom location: 0: jdbc:hive2://hostname.cloudera.co> create database grandtour location 'hdfs://hostname.cloudera.com:8020/bdr-test/grandtour.db'; <TRUNCATED> INFO : Executing command(queryId=hive_20210818180811_d96dd7f8-2713-440f-8e9f- 8eebd2954d05): create database grandtour location 'hdfs://hostname.cloudera.com:8020/bdr-test/grandtour.db' INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20210818180811_d96dd7f8-2713- 440f-8e9f-8eebd2954d05); Time taken: 0.031 seconds INFO : OK No rows affected (0.083 seconds) Create a table and insert a record: 0: jdbc:hive2://hostname.cloudera.co> use grandtour; <TRUNCATED> INFO : Completed executing command(queryId=hive_20210818180835_f365f042-e2a9- 4f53-a9ed-317d21dcfc07); Time taken: 0.008 seconds INFO : OK No rows affected (0.065 seconds) 0: jdbc:hive2://hostname.cloudera.co> create table madagascar (name string); <TRUNCATED> INFO : Completed executing command(queryId=hive_20210818180858_79bb34bf-b703- 43c9-a720-4756c22cb661); Time taken: 0.053 seconds INFO : OK No rows affected (0.117 seconds) 0: jdbc:hive2://hostname.cloudera.co> insert into madagascar values('james may'); <TRUNCATED> INFO : Completed executing command(queryId=hive_20210818181038_83ea0501-5d24- 4c57-bdde-2e214e7abb9c); Time taken: 19.26 seconds INFO : OK 1 row affected (19.47 seconds) Table contents: 0: jdbc:hive2://hostname.cloudera.co> select * from madagascar; <TRUNCATED> INFO : Completed executing command(queryId=hive_20210818181134_346ee6b8-0c2e- 4eae-9ee9-e88d94aa6b3e); Time taken: 0.001 seconds INFO : OK +------------------+ | madagascar.name | +------------------+ | james may | +------------------+ 1 row selected (0.421 seconds) 0: jdbc:hive2://hostname.cloudera.co> HDFS listing in CDH 6.x: [hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test Found 1 items drwxrwxrwx - hive supergroup 0 2021-08-18 18:08 /bdr-test/grandtour.db [hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test/grandtour.db Found 1 items drwxrwxrwx - hive supergroup 0 2021-08-18 18:08 /bdrtest/ grandtour.db/madagascar [hdfs@c441-node4 ~]$ hdfs dfs -ls /bdr-test/grandtour.db/madagascar Found 1 items -rwxrwxrwx 3 hive supergroup 10 2021-08-18 18:10 /bdrtest/ grandtour.db/madagascar/000000_0 [hdfs@c441-node4 ~]$ hdfs dfs -cat /bdr-test/grandtour.db/madagascar/000000_0 james may Run a Hive BDR job in CDP cluster. After BDR, on the destination cluster: 0: jdbc:hive2://hostname.cloudera.co> show databases; <TRUNCATED> INFO : Completed executing command(queryId=hive_20210818185431_6c2ad328-78c4- 454d-bead-3aa7baae907e); Time taken: 0.007 seconds INFO : OK +---------------------+ | database_name | +---------------------+ | default | | grandtour | | information_schema | | sys | +---------------------+ 4 rows selected (0.044 seconds) Describe output shows a different location. It should be /bdr-test, but it shows the default location. Even though describe shows a wrong location, the table is at the correct location on HDFS. The listing looks as follows: [root@c241-node3 ~]# hdfs dfs -ls / Found 7 items drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdr-test drwxrwxrwx - hbase hbase 0 2021-08-18 18:29 /hbase drwxrwxrwx - hdfs supergroup 0 2021-08-18 01:14 /ranger drwxrwxrwx - solr solr 0 2021-08-18 01:14 /solr-infra drwxrwxrwx - hdfs supergroup 0 2021-08-18 18:25 /tmp drwxrwxrwx - hdfs supergroup 0 2021-08-18 18:25 /user drwxrwxrwx - hdfs supergroup 0 2021-08-18 01:14 /warehouse [root@c241-node3 ~]# hdfs dfs -ls /bdr-test Found 1 items drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdr-test/grandtour.db [root@c241-node3 ~]# hdfs dfs -ls /bdr-test/grandtour.db Found 1 items drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdrtest/ grandtour.db/madagascar [root@c241-node3 ~]# hdfs dfs -ls -R /bdr-test/grandtour.db drwxrwxrwx - hive supergroup 0 2021-08-18 18:40 /bdrtest/ grandtour.db/madagascar -rwxrwxrwx 3 hive supergroup 10 2021-08-18 18:10 /bdrtest/ grandtour.db/madagascar/000000_0 [root@c241-node3 ~]# hdfs dfs -cat /bdr-test/grandtour.db/madagascar/000000_0 james may The reason the HDFS listing gets created this way is that this is the table location. create_database call from client is coming with "locationUri:/warehouse/tablespace/managed/hive/grandtour.db" create_table call from client is coming with "location:/bdr-test/grandtour.db/madagascar" You can verify this in the Hive metastore log. Note the locationURI in these messages. For database: 2021-08-18 18:40:47,421 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-7-thread-196]: 203: source:172.xx.xx.xx create_database: Database(name:grandtour, description:null, locationUri:/warehouse/tablespace/managed/hive/grandtour.db, parameters:{}, ownerName:hive, ownerType:USER, catalogName:hive, createTime:1629310091) For table: 2021-08-18 18:40:47,563 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-7-thread-198]: 205: source:172.xx.xx.xx create_table_req: Table(tableName:madagascar, dbName:grandtour, owner:hive, createTime:1629310138, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null)], location:/bdr-test/grandtour.db/madagascar, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{external.table.purge=true, numRows=1, rawDataSize=9, transient_lastDdlTime=1629310257, numFilesErasureCoded=0, totalSize=10, EXTERNAL=TRUE, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER) Now, how to fix the database location? This issue has been resolved in Cloudera Manager 7.4.4.

mugdha · ‎09-20-2021

In this example, I am importing encryption keys from HDP 3.1.5 cluster to an HDP 2.6.5 cluster. Create key "testkey" in Ranger KMS HDP 3.1.5 cluster with steps: List and Create Keys. In HDP 3.1.5, the current master key is: Encryption Key: Create an encryption zone with the "testkey": [hdfs@c241-node3 ~]$ hdfs crypto -createZone -keyName testkey -path /testEncryptionZone Added encryption zone /testEncryptionZone List to confirm the zone and keys: [hdfs@c241-node3 ~]$ hdfs crypto -listZones /testEncryptionZone testkey Export the keys: Log in to KMS host export java home cd /usr/hdp/current/ranger-kms ./exportKeysToJCEKS.sh $filename The output will look as follows: [root@c241-node3 ranger-kms]# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk- 1.8.0.292.b10-1.el7_9.x86_64/jre [root@c241-node3 ranger-kms]# ./exportKeysToJCEKS.sh /tmp/hdp315keys.keystore Enter Password for the keystore FILE : Enter Password for the KEY(s) stored in the keystore: Keys from Ranger KMS Database has been successfully exported into /tmp/hdp315keys.keystore On to the HDP 2.6.5 cluster where we need to import the keys, do the following: Log in to KMS host Add org.apache.hadoop.crypto.key.**; in the property jceks.key.serialFilter. This needs to be changed in the following file on KMS host only: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/lib/security/java.security After the change, the entry in the file should look like this: jceks.key.serialFilter = java.lang.Enum;java.security.KeyRep;\ java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;org.apache.hadoop.crypto.k ey.**;!* export JAVA_HOME, RANGER_KMS_HOME, RANGER_KMS_CONF, SQL_CONNECTOR_JAR cd /usr/hdp/current/ranger-kms/ Run ./importJCEKSKeys.sh $filename JCEKS The output looks like this: [root@c441-node3 ranger-kms]# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk- 1.8.0.292.b10-1.el7_9.x86_64/jre [root@c441-node3 ranger-kms]# export RANGER_KMS_HOME=/usr/hdp/2.6.5.0-292/ranger-kms [root@c441-node3 ranger-kms]# export RANGER_KMS_CONF=/etc/ranger/kms/conf [root@c441-node3 ranger-kms]# export SQL_CONNECTOR_JAR=/var/lib/ambariagent/ tmp/mysql-connector-java.jar [root@c441-node3 security]# cd /usr/hdp/current/ranger-kms/ [root@c441-node3 ranger-kms]# ./importJCEKSKeys.sh /tmp/hdp315keys.keystore JCEKS Enter Password for the keystore FILE : Enter Password for the KEY(s) stored in the keystore: 2021-08-12 23:58:06,729 ERROR RangerKMSDB - DB Flavor could not be determined Keys from /tmp/hdp315keys.keystore has been successfully imported into RangerDB. To confirm that the encryption keys are imported, in DB of HDP 2.6.5 cluster, check the ranger_keystore table for the entry for "testkey". Also, check if the master key in HDP 2.6.5 is untouched; it is the same which Ranger KMS created: Now create an encryption zone in HDP 2.6.5 with the imported key: [hdfs@c441-node3 ~]$ hdfs dfs -mkdir /testEncryptionZone-265 [hdfs@c441-node3 ~]$ hdfs crypto -createZone -keyName testkey -path /testEncryptionZone-265 Added encryption zone /testEncryptionZone-265 Confirm the zone and keys: [hdfs@c441-node3 ~]$ hdfs crypto -listZones /testEncryptionZone-265 testkey Now for the distcp, note that it needs to have /.reserved/raw before the encryption zone path and -px option. Command: hadoop distcp -px /.reserved/raw/$encryptionZonePath/filename hdfs://destination/.reserved/raw/$encryptionZonePath/filename Check this document link to read about these options: Configuring Apache HDFS Encryption Following is the output of distcp. It is truncated but shows copied file. Note that the skipCRC is false. [hdfs@c241-node3 ~]$ hadoop distcp -px /.reserved/raw/testEncryptionZone/text.txt hdfs://172.25.37.10:8020/.reserved/raw/testEncryptionZone-265/ ERROR: Tools helper /usr/hdp/3.1.5.0-152/hadoop/libexec/tools/hadoop-distcp.sh was not found. 21/08/13 01:52:58 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[XATTR], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/.reserved/raw/testEncryptionZone/text.txt], targetPath=hdfs://172.25.37.10:8020/.reserved/raw/testEncryptionZone-265, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false}, sourcePaths=[/.reserved/raw/testEncryptionZone/text.txt], targetPathExists=true, preserveRawXattrsfalse <TRUNCATED> 21/08/13 01:52:59 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0 21/08/13 01:52:59 INFO tools.SimpleCopyListing: Build file listing completed. 21/08/13 01:52:59 INFO tools.DistCp: Number of paths in the copy list: 1 21/08/13 01:52:59 INFO tools.DistCp: Number of paths in the copy list: 1 <TRUNCATED> DistCp Counters Bandwidth in Btyes=21 Bytes Copied=21 Bytes Expected=21 Files Copied=1 Another question that came up - what happens to old keys when I import a new key? It just gets added to the existing keys. Here is a screenshot:

mugdha · ‎03-23-2021

@zampJeri This /tmp is about the OS file system, not HDFS. It wants to create the _resources files and unable. Does the user have permissions on /tmp/hive?

mugdha · ‎03-22-2021

The 'hive.notification.sequence.lock.max.retries' Parameter detects the number of retries for acquiring a Lock for getting the Next Notification ID for entries in the 'NOTIFICATION_LOG' Table. The error that you are seeing does seem to be because of this. Add more context, when are you seeing this? What job are you running? Full trace?

mugdha · ‎05-11-2020

To set up health_percent of LLAP, do the following: On the Hiveserver2Interactive server nodes, edit /usr/hdp/<hdp-version>/hive/scripts/llap/yarn/package.py Example: /usr/hdp/3.1.0.224-3/hive/scripts/llap/yarn/package.py The --health-percent defaults to 80. Change this to desired-number. Move the following to a temporary backup location: a. /usr/hdp/<hdp-version>/hive/scripts/llap/yarn/package.pyc b. /usr/hdp/<hdp-version>/hive/scripts/llap/yarn/package.pyo Restart Hive.

mugdha · ‎11-19-2018

This article just gives an example of how 'grant'/'revoke' works when the Hive plugin is enabled with Ranger in CDP. A user who is 'admin' in Ranger, can manage access to Hive tables via 'grant'/'revoke' operation. In Ranger UI > Settings > Users and Groups > Users Note: User 'hive' is in role 'Admin' On the beeline, login as user 'hive'. Run the grant command to give select privileges on a table: 0: jdbc:hive2://a.b.c.co> grant select on table mix to user mugdha; INFO : Compiling command(queryId=hive_20211021024819_c3de84a7-a312-4a1f-9a8d-8b328cced054): grant select on table mix to user mugdha INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20211021024819_c3de84a7-a312-4a1f-9a8d-8b328cced054); Time taken: 0.022 seconds INFO : Executing command(queryId=hive_20211021024819_c3de84a7-a312-4a1f-9a8d-8b328cced054): grant select on table mix to user mugdha INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20211021024819_c3de84a7-a312-4a1f-9a8d-8b328cced054); Time taken: 0.548 seconds INFO : OK No rows affected (0.634 seconds) In Ranger, a new policy is created by that command: Similarly, in a 'revoke' run, user 'mugdha', will be removed from the policy: 0: jdbc:hive2://a.b.c.co> revoke select on table mix from user mugdha; INFO : Compiling command(queryId=hive_20211021025423_cdf81a8a-df0d-4c40-9509-f4325d3ba112): revoke select on table mix from user mugdha INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20211021025423_cdf81a8a-df0d-4c40-9509-f4325d3ba112); Time taken: 0.032 seconds INFO : Executing command(queryId=hive_20211021025423_cdf81a8a-df0d-4c40-9509-f4325d3ba112): revoke select on table mix from user mugdha INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20211021025423_cdf81a8a-df0d-4c40-9509-f4325d3ba112); Time taken: 0.274 seconds INFO : OK No rows affected (0.323 seconds) This also works the same way in HDP, see Provide User Access to Hive Database Tables from the Command Line

mugdha · ‎06-27-2018

Step by step instructions to set up acls on the queue. For Adding/removing queues, see:- https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-views/content/ch_using_yarn_queue_manager_view.html Setting up queue acls: 1. Enable yarn acl: a. In Yarn -> Configs -> Advanced -> Resource Manager Set yarn.acl.enable to true and Save. b. Restart Yarn service. 2. Restrict the access on the “root” queue first. Child queues inherit the access configuration from the root queue. If this is not done, all users will be able to submit the jobs to the child queues. On the YARN Queue Manager view instance configuration page, a. Click on the “root” queue. b. Under “Access Control and Status” -> Submit Applications -> Choose custom. Leave this blank. c. Now click on the child queue. d. Under “Access Control and Status” -> Submit Applications -> Choose custom -> In Users/Groups, enter the username. e. Save and Refresh queue. 3. Notice that in capacity-scheduler config in Yarn -> Configs-> Advanced -> (Section below) Two properties are changed: a. yarn.scheduler.capacity.root.acl_submit_applications= Note: A little about this, this is not blank in the config, there is a space at the end. If this property is removed from this config, this will reset the acl_submit_applications to * for root queue. If the parent queue uses the "*" (asterisk) value (or is not specified) to allow access to all users and groups, its child queues cannot restrict access. b. yarn.scheduler.capacity.root.test.acl_submit_applications=hive Confirming that ACL is set: Now that acl is set, to confirm if acl is active for the user, login to linux terminal as hive user and run: hadoop queue -showacls (This command is deprecated, but works) mapred queue -showacls (Alternative command) Output: For hive user: For any other user: We can do similar for Administer queue. Restrict the access on the “root” queue first: Under “Access Control and Status” -> Administer Queue -> Choose custom -> In Users/Groups, enter the username/groupname. Now when you run mapred queue -showacls command, it will show access of all users like: root: hive: yarn:

mugdha · ‎06-30-2017

PROBLEM: Click Alerts and then Actions > Manage Alert Group -> Custom Alert Group. Then, click the + sign on the right side and pick any alert definition and press OK. Click Save and you will see the 500(server error) Server Error on the alert group screen: And in the ambari-server.log there is error: WARN [qtp-ambari-client-510524] ServletHandler:563 - /api/v1/clusters/<cluster-name>/alert_groups/155 java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at org.eclipse.persistence.indirection.IndirectSet$1.next(IndirectSet.java:471) at org.apache.ambari.server.orm.entities.AlertGroupEntity.setAlertTargets(AlertGroupEntity.java:313) at org.apache.ambari.server.controller.internal.AlertGroupResourceProvider.updateAlertGroups(AlertGroupResourceProvider.java:344) at org.apache.ambari.server.controller.internal.AlertGroupResourceProvider.access$100(AlertGroupResourceProvider.java:60) at org.apache.ambari.server.controller.internal.AlertGroupResourceProvider$2.invoke(AlertGroupResourceProvider.java:187) at org.apache.ambari.server.controller.internal.AlertGroupResourceProvider$2.invoke(AlertGroupResourceProvider.java:184) at org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:450) at org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:331) ROOT CAUSE: https://issues.apache.org/jira/browse/AMBARI-19259 RESOLUTION: Upgrade Ambari to 2.5

mugdha · ‎06-30-2017

Consider the example: Total input paths = 7 Input size for job = 510K 1) While are using a custom InputFormat which extends ‘org.apache.hadoop.mapred.FileInputFormat’ and having ‘isSplitable’ as false. Expected : 7 splits [As FileInputFormat doesn't split file smaller than blockSize (128 MB) so there should be one split per file] Actual: 4 splits 2) Default value for 'hive.input.format' is CombineHiveInputFormat. After setting ‘set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;’, there are 7 splits as expected. From above two points, it looks hive uses ‘CombineHiveInputFormat’ on top of the custom InputFormat to determine number of splits. How splits were calculated: For deciding the number of mappers when using CombineInputFormat, data locality plays a role. Now to find where those files belong we can get it from command: hadoop fsck /<file-path> -files -blocks -locations 1. On. a.a.a.a /user/user1/hive/split/file1_0000 [/default-rack/a.a.a.a:1019, /default-rack/e.e.e.e:1019] /user/user1/hive/split/file1_0002 [/default-rack/a.a.a.a:1019, /default-rack/e.e.e.e:1019] 2. On b.b.b.b /user/user1/hive/split/file1_0003 [/default-rack/b.b.b.b:1019, /default-rack/a.a.a.a:1019] /user/user1/hive/split/file1_0005 [/default-rack/b.b.b.b:1019, /default-rack/a.a.a.a:1019] /user/user1/hive/split/file1_0006 [/default-rack/b.b.b.b:1019, /default-rack/e.e.e.e:1019] 3. On c.c.c.c /user/user1/hive/split/file1_0001 [/default-rack/c.c.c.c:1019, /default-rack/a.a.a.a:1019] 4. On d.d.d.d /user/user1/hive/split/file1_0004 [/default-rack/d.d.d.d:1019, /default-rack/a.a.a.a:1019] Hive is picking up blocks from these 4 DNs. Files on 1 DN are combined into 1 task. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default splitting behavior in Hadoop: each block is a locally processed split. Ref: https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html The reason it has picked the first block location for each blocks while combining is any Hadoop Client will use the first block location and will consider the next only if reading the first fails. Usually NameNode will return the block locations of a block sorted based upon the distance between the client and location. NameNode will give all block locations but CombineHiveInputFormat / Hadoop Client / MapReduce Program uses the first block location.

mugdha · ‎06-30-2017

PROBLEM: After moving the Zookeeper servers and setting correctly in the yarn configs, Resource managers come up but are in standby state. Even after removing the znode - rmstore, none of the nodes transition to active. ROOT CAUSE: Zookeeper data is stored in znode yarn-leader-election. This is used for RM leader election. This znode has stale data about zookeeper leader. RESOLUTION: 1. Login into zkcli 2. rmr /yarn-leader-election 3. Restart Resource managers.

Online	Offline
Last Visited	‎07-22-2022 05:43 PM

Member Since	‎09-29-2015 04:12 AM
Last Visited	‎07-22-2022 05:43 PM
Posts	186
Kudos received	62

Cloudera Community

Hive BDR job from CDH to CDP changes Hive table lo...

Import/Export Ranger KMS keys and distcp without s...

Re: User class threw exception: org.apache.spark.s...

Re: Will updating hive.notification.sequence.lock....

How to set up health_percent for LLAP

How grant/revoke in hive works with Ranger?

Setting up Yarn queue acls

Manage Alert Groups doesn't allow to update existi...

How Hive determines the number of splits

Resource Managers are starting up both in standby.