Member since
06-16-2016
43
Posts
22
Kudos Received
0
Solutions
03-08-2018
09:05 PM
1 Kudo
A Hive Database can contain both transactional and non transactional table. Hence, if we are doing some quick checks to determine if the table is ACID enabled, please run the following command. # hive -e "describe extended <Database>.<tablename>;" | grep "transactional=true" If you get an output with the string that you grep for, then the table is transactional. Example: #hive -e "describe extended default.hello_acid;" | grep "transactional=true"
... View more
Labels:
01-10-2018
08:14 PM
1 Kudo
In large clusters, where there are multiple services which makes use of single Zookeeper quorum, the state store is maintained as znodes. Hence the count of such znodes are directly proportional to the services that are deployed and also the activity on the cluster.
If LLAP apps are deployed in such clusters, it is imperative that slider is enabled (by setting the property, "hadoop.registry.rm.enabled"), this will introduce an overhead in the Znode scans for all the application containers that are created and destroyed on timely basis. The behavior of the scans are as described below, If the property is set in core-site.xml or yarn-site.xml, the YARN Resource Manager will behave as follows: 1. On startup: create the initial root paths of /, /services and /users. On a secure cluster, access will be restricted to the system accounts (see below). 2. When a user submits a job: create the user path under /users. 3. When a container is completed: delete from the registry all service records with a yarn:persistence field of value container, and a yarn:id field whose value matches the ID of the completed container. 4. When an application attempt is completed: remove all service records with yarn:persistence set to application-attempt and yarn:id set to the pplication attempt ID. 5. When an application finishes: remove all service records with yarn:persistence set to application and yarn:id set to the application ID. Ref: Registry scan Hence, this leads to registry scan across all the znodes irrespective of rmservice znode. Meaning, even if there are few thousand (<10K) of applications in /rmstore (/rmstore-secure), the scan would be from root level (/). If the count of znodes under root exceeds 10k limit, this leads to registry scan and hence the connectivity issues between ZK and RM which leads to timeout and hence RM failover and hence its stability. This is addressed in this Apache JIRA. ROOT CAUSE: https://issues.apache.org/jira/browse/YARN-6136 RESOLUTION: To implement change in the ZK scan behavior. WORKAROUND: 1. If LLAP (slider) is not used: Disable, hadoop.registry.rm.enabled 2. If LLAP (slider) is used: i) Assume only LLAP uses slider, if nobody else is using the same ZK cluster, the only way to reduce ZK load is lower yarn.resourcemanager.state-store.max-completed-applications to 3k ii) If other services use ZK quorum, please reach out to HWX support.
... View more
Labels:
12-18-2017
09:21 PM
Labels:
- Labels:
-
Apache Ambari
-
Apache Ranger
09-14-2017
09:28 PM
It is expected in large clusters where node count ranges to few hundreds, the master services tend to be busy. One such master service is Namenode. Some of the critical activities that NN does includes, 1. Addressing client requests which includes verifying proper permissions, auth checks for HDFS resources. 2. Constant block report monitoring from all the Datanodes. 3. Updating the service and audit logs. are to name a few. In certain situations when there are rogue applications which tries to access multiple resources in HDFS or a data ingestion that is trying to load high data volumes, NN tends to be very busy. In such situations and cluster like these NN FSImage tends to be in $$GB. Hence, operations such as checkpointing would consume considerable bandwidth across the two Namenodes. Hence, high volume of edits sync along with loggings would cause high disk utilization which can lead to NameNode instability. Hence, it is recommended to have dedicated disks for service logs and edit logs. We can monitor the IO on the disks using `iostat` output.
... View more
Labels:
08-08-2017
12:49 AM
3 Kudos
This article tries to compare the data recovery period of accidentally deleted data in HDFS. We would compare two scenarios,
1. When trash is enabled.
2. When snapshot is enabled.
Data Recovery from trash:
When a data from HDFS is deleted, metadata in HDFS is updated to delete the file from the source folder. However, the blocks from the datanode is not immediately deleted. The trash folder in HDFS is updated with the file along with the directory from where it is deleted in the user's .trash folder. The deleted data could be recovered from the trash folder.
Example: 1. Existing data in HDFS. #hadoop fs -ls /tmp/test1.txt
-rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:47 /tmp/test1.txt
2. Deleted data in HDFS. #hadoop fs -rm /tmp/test1.txt
17/08/07 23:52:13 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vnn/tmp/test1.txt' to trash at: hdfs://vnn/user/hdfs/.Trash/Current/tmp/test1.txt
3. Recovering a deleted data #hadoop fs -cp /user/hdfs/.Trash/Current/tmp/test1.txt /tmp/
#hadoop fs -ls /tmp/test1.txt
-rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:57 /tmp/test1.txt
Data recovery from snapshots: Snapshots are read-only point in time copies of HDFS file system. Enable a directory to be snapshot-able to recovery any accidental data loss. 1. Enabling snapshot. #hdfs dfsadmin -allowSnapshot /tmp/snapshotdir
Allowing snaphot on /tmp/snapshotdir succeeded
2. Create snapshot for a directory. #hdfs dfs -createSnapshot /tmp/snapshotdir
Created snapshot /tmp/snapshotdir/.snapshot/s20170807-180139.568 3. Contents of HDFS snapshot based folder. #hdfs dfs -ls /tmp/snapshotdir/
Found 3 items
hadoop fs -rm $1
-rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/oneGB.csv
-rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/tenGB.csv
#hdfs dfs -ls /tmp/snapshotdir/.snapshot/s20170807-180139.568
Found 3 items
-rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv
-rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/.snapshot/s20170807-180139.568/tenGB.csv
4. Delete and recovering lost data. #hadoop fs -rm /tmp/snapshotdir/oneGB.csv
17/08/07 19:37:46 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vinodnn/tmp/snapshotdir/oneGB.csv' to trash at: hdfs://vinodnn/user/hdfs/.Trash/Current/tmp/snapshotdir/oneGB.csv1502134666492
#hadoop fs -cp /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv /tmp/snapshotdir/
It is seen in the above methods that hadoop copy "hadoop fs -cp <source> <dest>" is used to recover the data. However, the time taken by "cp" operation would increase as the size of the lost data increases. One of the optimizations would be to use the move command, "hadoop fs -mv <source> <destination>" in place of copy operation, as former operation fairs better over latter. Since, snapshot folders are read-only, the only supported operation is "copy" ( but not move ). Following are the metrics that are used to compare the performance of "copy" operation over "move" for one GB and ten GB data file. Time to recover a file using copy (cp) operations: screen-shot-2017-08-07-at-60552-pm.png Time to recover a file using move (mv) operations: screen-shot-2017-08-07-at-60602-pm.png Hence, we observe that recovery of data using trash along with move operation is efficient in certain cases to tackle accidental data loss and recovery. NOTE: Recovering the data from trash would be possible if trash interval (fs.trash.interval) are properly configured to give Hadoop admins enough time to detect the data loss and recover it. If not, snapshot would be recommended for eventual recovery.
... View more
Labels:
07-18-2017
06:54 PM
I had to remove following properties as well, oozie.service.ELService.ext.functions.workflow
oozie.service.ELService.ext.functions.coord-sla-create On the whole, had to make sure there are no oozie.extensions and ELService Functions that were used in the oozie-site.xml. Thank you @Kuldeep Kulkarni and @Orlando Teixeira
... View more
05-26-2017
10:50 PM
1 Kudo
I am trying to 1. Enable HDFS ranger plugin. 2. Add an additional HDFS service in Ranger I am following the documentation, https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5+-+User+Guide which suggests to add the service ("Add Service" under "Service Manager" using "+" next to HDFS). I am trying to add the basic entries that are needed to bring up the service and have a successful "Test Connection". Following are the values for the fields that I am entering, Service Name : ranger1_hadoop
Username : admin
Password : admin
Namenode URL : hdfs://<hostname -f>:8020
Authentication Type : Simple
Test connection was failing with the below error, Connection Failed.
Unable to retrieve any files using given parameters, You can still save the repository and start creating policies, but you would not be able to use autocomplete for resource names. Check ranger_admin.log for more info. Observation: 1. There is no file by the name "ranger_admin.log" in my Ranger hosts as specified by the above logs. Is this expected? 2. In xa_portal.log, I see the following stack trace, 2017-05-26 22:38:29,578 [timed-executor-pool-0] INFO apache.ranger.services.hdfs.client.HdfsClient (HdfsClient.java:208) - ===> HdfsClient.testConnection()
2017-05-26 22:38:29,579 [timed-executor-pool-0] ERROR org.apache.ranger.plugin.util.PasswordUtils (PasswordUtils.java:127) - Unable to decrypt password due to error
javax.crypto.IllegalBlockSizeException: Input length must be multiple of 8 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824)
at com.sun.crypto.provider.PBES1Core.doFinal(PBES1Core.java:416)
at com.sun.crypto.provider.PBEWithMD5AndDESCipher.engineDoFinal(PBEWithMD5AndDESCipher.java:316)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
at org.apache.ranger.plugin.util.PasswordUtils.decryptPassword(PasswordUtils.java:112)
at org.apache.ranger.plugin.client.BaseClient.login(BaseClient.java:113)
at org.apache.ranger.plugin.client.BaseClient.<init>(BaseClient.java:59)
at org.apache.ranger.services.hdfs.client.HdfsClient.<init>(HdfsClient.java:52)
at org.apache.ranger.services.hdfs.client.HdfsClient.connectionTest(HdfsClient.java:221)
at org.apache.ranger.services.hdfs.client.HdfsResourceMgr.connectionTest(HdfsResourceMgr.java:47)
at org.apache.ranger.services.hdfs.RangerServiceHdfs.validateConfig(RangerServiceHdfs.java:58)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:560)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:547)
at org.apache.ranger.biz.ServiceMgr$TimedCallable.call(ServiceMgr.java:508)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-05-26 22:38:29,580 [timed-executor-pool-0] ERROR apache.ranger.services.hdfs.client.HdfsResourceMgr (HdfsResourceMgr.java:49) - <== HdfsResourceMgr.testConnection Error: Unable to login to Hadoop environment [ranger1_hadoop]
org.apache.ranger.plugin.client.HadoopException: Unable to login to Hadoop environment [ranger1_hadoop]
at org.apache.ranger.plugin.client.BaseClient.login(BaseClient.java:136)
at org.apache.ranger.plugin.client.BaseClient.<init>(BaseClient.java:59)
at org.apache.ranger.services.hdfs.client.HdfsClient.<init>(HdfsClient.java:52)
at org.apache.ranger.services.hdfs.client.HdfsClient.connectionTest(HdfsClient.java:221)
at org.apache.ranger.services.hdfs.client.HdfsResourceMgr.connectionTest(HdfsResourceMgr.java:47)
at org.apache.ranger.services.hdfs.RangerServiceHdfs.validateConfig(RangerServiceHdfs.java:58)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:560)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:547)
at org.apache.ranger.biz.ServiceMgr$TimedCallable.call(ServiceMgr.java:508)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to decrypt password due to error
at org.apache.ranger.plugin.util.PasswordUtils.decryptPassword(PasswordUtils.java:128)
at org.apache.ranger.plugin.client.BaseClient.login(BaseClient.java:113)
... 12 more
Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 8 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824)
at com.sun.crypto.provider.PBES1Core.doFinal(PBES1Core.java:416)
at com.sun.crypto.provider.PBEWithMD5AndDESCipher.engineDoFinal(PBEWithMD5AndDESCipher.java:316)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
at org.apache.ranger.plugin.util.PasswordUtils.decryptPassword(PasswordUtils.java:112)
... 13 more
2017-05-26 22:38:29,580 [timed-executor-pool-0] ERROR org.apache.ranger.services.hdfs.RangerServiceHdfs (RangerServiceHdfs.java:60) - <== RangerServiceHdfs.validateConfig Error: Unable to login to Hadoop environment [ranger1_hadoop]
org.apache.ranger.plugin.client.HadoopException: Unable to login to Hadoop environment [ranger1_hadoop]
at org.apache.ranger.plugin.client.BaseClient.login(BaseClient.java:136)
at org.apache.ranger.plugin.client.BaseClient.<init>(BaseClient.java:59)
at org.apache.ranger.services.hdfs.client.HdfsClient.<init>(HdfsClient.java:52)
at org.apache.ranger.services.hdfs.client.HdfsClient.connectionTest(HdfsClient.java:221)
at org.apache.ranger.services.hdfs.client.HdfsResourceMgr.connectionTest(HdfsResourceMgr.java:47)
at org.apache.ranger.services.hdfs.RangerServiceHdfs.validateConfig(RangerServiceHdfs.java:58)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:560)
at org.apache.ranger.biz.ServiceMgr$ValidateCallable.actualCall(ServiceMgr.java:547)
at org.apache.ranger.biz.ServiceMgr$TimedCallable.call(ServiceMgr.java:508)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to decrypt password due to error
at org.apache.ranger.plugin.util.PasswordUtils.decryptPassword(PasswordUtils.java:128)
at org.apache.ranger.plugin.client.BaseClient.login(BaseClient.java:113)
... 12 more
Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 8 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824)
at com.sun.crypto.provider.PBES1Core.doFinal(PBES1Core.java:416)
at com.sun.crypto.provider.PBEWithMD5AndDESCipher.engineDoFinal(PBEWithMD5AndDESCipher.java:316)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
at org.apache.ranger.plugin.util.PasswordUtils.decryptPassword(PasswordUtils.java:112)
... 13 more
2017-05-26 22:38:29,580 [timed-executor-pool-0] ERROR org.apache.ranger.biz.ServiceMgr$TimedCallable (ServiceMgr.java:510) - TimedCallable.call: Error:org.apache.ranger.plugin.client.HadoopException: Unable to login to Hadoop environment [ranger1_hadoop]
2017-05-26 22:38:29,580 [http-bio-6080-exec-7] ERROR org.apache.ranger.biz.ServiceMgr (ServiceMgr.java:188) - ==> ServiceMgr.validateConfig Error:org.apache.ranger.plugin.client.HadoopException: org.apache.ranger.plugin.client.HadoopException: Unable to login to Hadoop environment [ranger1_hadoop]
After enabling the HDFS plugin in HDFS service section of Ambari, there is a service that is created by the name "Ranger_hadoop" in Ranger UI. However, I am not able to add another HDFS service.
... View more
Labels:
- Labels:
-
Apache Ranger
05-26-2017
12:58 AM
1 Kudo
ISSUE: While configuring NFS mounts to access HDFS as a part of local FS, we do tend to control the access using nfs proxies as shown below, <property>
<name>hadoop.proxyuser.nfsserver.groups</name>
<value>nfs-users1,nfs-users2</value>
<description>
The 'nfsserver' user is allowed to proxy all members of the
'nfs-users1' and 'nfs-users2' groups. Set this to '*' to allow
nfsserver user to proxy any group.
</description>
</property>
<property>
<name>hadoop.proxyuser.nfsserver.hosts</name>
<value>nfs-client-host1.com</value>
<description>
This is the host where the nfs gateway is running. Set this to
'*' to allow requests from any hosts to be proxied.
</description>
</property> However, a user who has access to NFS server would be able to access (view) the HDFS file system even if they are not part of "hadoop.proxyuser.nfsserver.groups" and "hadoop.proxyuser.nfsserver.hosts" . This may be a security flaw in certain scenarios, ROOT CAUSE: This is due to a property, "nfs.exports.allowed.hosts" which is used to allow the access to the HDFS from the hosts. RESOLUTION: Make sure the desired hosts and permissions are assigned to HDFS. Permissions for the property can be defined as below, <property>
<name>nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property> NOTE: NFS gateway restart may be needed if the property is altered Links: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Allow_mounts_from_unprivileged_clients
... View more
Labels:
05-22-2017
06:52 PM
1 Kudo
Apart from checking the topologies from a Storm WebUI, we can also list the active topologies from one of the cluster nodes, We can use the following command to list the same, /usr/hdp/<HDP-version>/storm/bin/storm list If there are no topologies running, we would get an output as follows, No topologies running.
... View more
Labels:
04-07-2017
10:54 PM
1 Kudo
ISSUE: Java Heap Space issue in Hive MR engine
While working on a sample data set in hive. Query such as "select count(*)" was seen to fail with below error. Starting Job = job_1491603076412_0001, Tracking URL = http://krishna3.openstacklocal:8088/proxy/application_1491603076412_0001/
Kill Command = /usr/hdp/2.4.2.0-258/hadoop/bin/hadoop job -kill job_1491603076412_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-04-07 22:18:09,736 Stage-1 map = 0%, reduce = 0%
2017-04-07 22:18:46,065 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1491603076412_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1491603076412_0001_m_000000 (and more) from job job_1491603076412_0001
Task with the most failures(4):
-----
Task ID:
task_1491603076412_0001_m_000000
URL:
http://krishna3.openstacklocal:8088/taskdetails.jsp?jobid=job_1491603076412_0001&tipid=task_1491603076412_0001_m_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Checking the corresponding application logs, we observe that 2017-04-07 22:25:40,828 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:986)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:442)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) ROOT CAUSE: Insufficient HeapSpace in MR Engine for mapreduce.map.memory.mb RESOLUTION: Increasing mapreduce.map.memory.mb from 1.2G to 1.75G and hence increasing the mapreduce.task.io.sort.mb to 1003 mapreduce.map.java.opts to -Xmx1433m and restarting the necessary services did resolve the problem. (NOTE: mapreduce.task.io.sort.mb and mapreduce.map.java.opts value recommendations were made by Ambari )
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »