Member since
01-12-2016
14
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1236 | 09-07-2016 08:16 AM |
01-31-2018
04:03 PM
The temporary solution is to fake hostname of the failing Namenode in /etc/revolv.conf (or equivalent) and make it point to the IP of the healthy Namenode, until your Namenode is back to life.
... View more
09-05-2017
10:05 AM
In HDP-2.6.1.0-129 (Ranger 0.7.0), I'm seeing (apparently new) Read and Write permissions (see screenshot) But I cannot find their counterpart in the documentation (https://cwiki.apache.org/confluence/display/RANGER/Hive+Commands+to+Ranger+Permission+Mapping) What are these permissions for, will they replace select and update permissions? Thanks
... View more
Labels:
- Labels:
-
Apache Ranger
07-15-2017
07:18 PM
I got this problem once, and it was because the Hive-client component was not installed on the server I was running hive command from.
... View more
05-12-2017
11:02 AM
I'm using HDP 2.5.3. But thanks I didn't pay attention to the version in the jira (https://issues.apache.org/jira/browse/RANGER-698). I'll upgrade then 🙂
... View more
05-12-2017
06:18 AM
https://cwiki.apache.org/confluence/display/RANGER/Support+for+%24username+variable describes an interesting feature to automagically create dynamic rules for users, for instance a personal db for every users. Question is: how can we set this in Ranger admin or API? I got the following answer while calling the API: {"statusCode":1,"msgDesc":"{USER}: user does not exist. policy='test {USER}' service='INT1_hive'"}
... View more
Labels:
- Labels:
-
Apache Ranger
09-07-2016
08:16 AM
2 Kudos
Replying to my own question 🙂 For some reason, hdp-select was partially installe with a version 2.4.2.0. Running a "yum upgrade hdp-select" on all the nodes did fix the problem of version mismatch.
... View more
09-07-2016
07:59 AM
I'm doing a fresh install of Ambari 2.4.0.1 and HDP 2.5.0.0. Ambari is working fine, I can go through the wizard and select the stack (2.5.0.0), do a manual registration of my hosts, assign the components (master/slave) and start the deployment. The "Install, Test and Start" phase fails with the following stacktrace: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 174, in <module>
DataNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 49, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 567, in install_packages
retry_count=agent_stack_retry_count)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package
self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 83, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 91, in _call_with_retries
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_4_2_0_258' returned 1. Error: Nothing to do It fails installing hadoop_2_4_2_0_258 while I did select HDP 2.5.0.0 stack ?!?! Looking at the yum repo set during the manual registration, it makes sense that version 2.4.2.0 is not found, since it's properly pointing to 2.5 repo: $ cat /etc/yum.repos.d/HDP.repo
[HDP-2.5]
name=HDP-2.5
baseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0
path=/
enabled=1
Why are the wrong version tried to be installed on the nodes? Thanks
... View more
Labels:
01-13-2016
02:48 PM
@Artem Ervits (I'm new to the community forum, not sure if you receive notification to every answers)
... View more
01-13-2016
12:43 PM
Well, Hive and everything else is working fine in this degraded HA mode, the only one having problems is Spark. Active Namenode logs (I just striped out BlockStateChange noise) is given below. There's no stacktrace or anything validating the hypothesis of problem in the HA setup. I reproduced the problem on several environments as follow: Setup a fresh cluster with NameNode HA spark-shell --master yarn-master works fine
Figure out which namenode is referred as nn1 in hdfs-site.xml Shutdown this namenode referred as nn1 spark-shell --master yarn-master is failing, waiting infinitely on nn1:8020 If you shutdown the second Namenode in the list of dfs.ha.namenodes.<nameservice>, spark is still working fine. This makes me thinking that Spark HDFS client, as surprising as it could be, is not falling back to the other namenode. After turning on more logging, it looks like by default it is trying 500 times before falling back to the other namenode: There are some tickets speaking about dfs.client.retry.policy.enabled to be turned on or off (https://issues.apache.org/jira/browse/AMBARI-11192 and https://issues.apache.org/jira/browse/AMBARI-12138, as well as https://issues.apache.org/jira/browse/HADOOP-12078), but in the case of Spark, this does not seem to have any impact... My reading of the logs tells me that, for some reason, java.net.NoRouteToHostException is not considered as a failure and the client keep retrying the same namenode, while what I'm expecting is to try with the next in the list... 16/01/13 08:30:21 DEBUG RetryUtils: RETRY 0) policy=MultipleLinearRandomRetry, exception=java.net.NoRouteToHostException: No route to host
16/01/13 08:30:23 INFO Client: Retrying connect to server: daplab-rt-11.fri.lan/10.10.10.111:8020. Already tried 0 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[500x2000ms], TryOnceThenFail]
16/01/13 08:30:24 DEBUG RetryUtils: RETRY 1) policy=MultipleLinearRandomRetry, exception=java.net.NoRouteToHostException: No route to host
16/01/13 08:30:25 INFO Client: Retrying connect to server: daplab-rt-11.fri.lan/10.10.10.111:8020. Already tried 1 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[500x2000ms], TryOnceThenFail] Active Namenode logs: 2016-01-13 07:30:42,090 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888940_148127{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-913cec22-84a9-435b-9941-3839a5870b6f:NORMAL:10.10.10.15:50010|RBW], ReplicaUC[[DISK]DS-c982ff94-8e74-4d03-a74c-4b81b48840b5:NORMAL:10.10.10.25:50010|RBW], ReplicaUC[[DISK]DS-34bc38b0-7eec-4521-b756-bfcaaf528f42:NORMAL:10.10.10.23:50010|RBW]]} for /shared/zefix/company/2014/08/18/company_2014-08-18.avro
2016-01-13 07:30:56,522 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36246 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30797 SyncTimes(ms): 311448 238317
2016-01-13 07:32:54,882 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36253 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30804 SyncTimes(ms): 311526 238382
2016-01-13 07:33:54,901 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36259 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30810 SyncTimes(ms): 311593 238438
2016-01-13 07:34:34,561 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888941_148128{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-0112cfbf-aaca-4294-ae0d-2456935db0b6:NORMAL:10.10.10.26:50010|RBW], ReplicaUC[[DISK]DS-8170fc01-c5dc-4221-9ea2-1f4a3b343b49:NORMAL:10.10.10.32:50010|RBW], ReplicaUC[[DISK]DS-55ab7112-fc8a-428c-a948-cb6ee4efaefe:NORMAL:10.10.10.167:50010|RBW]]} for /user/daplab_prod/.sparkStaging/application_1452430297721_0098/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
2016-01-13 07:34:35,894 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888942_148129{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-c739e4e5-1545-40b9-a6ef-28989c6cc8eb:NORMAL:10.10.10.32:50010|RBW], ReplicaUC[[DISK]DS-25feb56a-94b7-4520-92dc-c46939414f9a:NORMAL:10.10.10.15:50010|RBW], ReplicaUC[[DISK]DS-31fb3172-c085-43ef-8ccd-f38e72492ad7:NORMAL:10.10.10.23:50010|RBW]]} for /user/daplab_prod/.sparkStaging/application_1452430297721_0098/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
2016-01-13 07:34:36,910 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /user/daplab_prod/.sparkStaging/application_1452430297721_0098/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar is closed by DFSClient_NONMAPREDUCE_186954515_1
2016-01-13 07:34:40,078 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888943_148130{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-0daa5887-d09e-4e96-888f-757e9c413581:NORMAL:10.10.10.23:50010|RBW], ReplicaUC[[DISK]DS-d3cb3d03-df7d-4211-8b9f-fe95b0e5954f:NORMAL:10.10.10.30:50010|RBW], ReplicaUC[[DISK]DS-09829b26-21f2-4c8b-9936-d69d9a0469a9:NORMAL:10.10.10.32:50010|RBW]]} for /user/daplab_prod/.sparkStaging/application_1452430297721_0098/__spark_conf__8474002130224342241.zip
2016-01-13 07:34:40,118 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /user/daplab_prod/.sparkStaging/application_1452430297721_0098/__spark_conf__8474002130224342241.zip is closed by DFSClient_NONMAPREDUCE_186954515_1
2016-01-13 07:34:54,937 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36286 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30831 SyncTimes(ms): 311815 238646
2016-01-13 07:35:54,963 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36292 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30837 SyncTimes(ms): 311862 238700
2016-01-13 07:36:55,010 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36298 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30843 SyncTimes(ms): 311908 238750
2016-01-13 07:37:22,743 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /shared/zefix/company/2014/08/18/company_2014-08-18.avro is closed by DFSClient_NONMAPREDUCE_-1471796874_1
2016-01-13 07:37:54,939 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888944_148131{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-36be5f99-94df-4539-b8e4-c6fcd07cca53:NORMAL:10.10.10.14:50010|RBW], ReplicaUC[[DISK]DS-49d7ed47-bbaf-477c-956e-54cc5dd5d165:NORMAL:10.10.10.24:50010|RBW], ReplicaUC[[DISK]DS-b0436f14-fe85-492d-999e-3f2fe44dd638:NORMAL:10.10.10.32:50010|RBW]]} for /shared/zefix/company/2014/02/14/company_2014-02-14.avro
2016-01-13 07:37:55,010 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36311 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30854 SyncTimes(ms): 312082 238850
2016-01-13 07:38:04,840 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888945_148132{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ea1ff4e9-53e5-4e8e-8c02-4a6a5993247a:NORMAL:10.10.10.31:50010|RBW], ReplicaUC[[DISK]DS-7c91c6c5-a0b5-46a6-a963-097150f29878:NORMAL:10.10.10.30:50010|RBW], ReplicaUC[[DISK]DS-d4e2030b-c29d-4e0c-a3b6-3e8b795d3b2e:NORMAL:10.10.10.13:50010|RBW]]} for /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-31.fri.lan_45454.tmp
2016-01-13 07:38:04,882 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-31.fri.lan_45454.tmp is closed by DFSClient_NONMAPREDUCE_-829750558_12726
2016-01-13 07:38:06,441 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888946_148133{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-d4e2030b-c29d-4e0c-a3b6-3e8b795d3b2e:NORMAL:10.10.10.13:50010|RBW], ReplicaUC[[DISK]DS-8ed05f08-aa49-43e8-b1ac-ca60d5cd6196:NORMAL:10.10.10.23:50010|RBW], ReplicaUC[[DISK]DS-dd4bdbf0-dc93-4e42-8009-dc13fe580385:NORMAL:10.10.10.31:50010|RBW]]} for /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-13.fri.lan_45454.tmp
2016-01-13 07:38:06,490 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-13.fri.lan_45454.tmp is closed by DFSClient_NONMAPREDUCE_1188492760_4267
2016-01-13 07:38:06,827 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888947_148134{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4c7faccb-83ab-4dbd-a763-f35690bd3767:NORMAL:10.10.10.32:50010|RBW], ReplicaUC[[DISK]DS-0b3b0051-e971-44a4-914c-14ef84528e05:NORMAL:10.10.10.30:50010|RBW], ReplicaUC[[DISK]DS-c26f498e-77dc-4d02-8abc-b1ebc9d25946:NORMAL:10.10.10.23:50010|RBW]]} for /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-32.fri.lan_45454.tmp
2016-01-13 07:38:06,865 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /app-logs/daplab_prod/logs/application_1452430297721_0098/daplab-wn-32.fri.lan_45454.tmp is closed by DFSClient_NONMAPREDUCE_-1454565051_8117
2016-01-13 07:38:55,024 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36336 Total time for transactions(ms): 790 Number of transactions batched in Syncs: 568 Number of syncs: 30873 SyncTimes(ms): 312254 239031
2016-01-13 07:39:55,046 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36342 Total time for transactions(ms): 791 Number of transactions batched in Syncs: 568 Number of syncs: 30879 SyncTimes(ms): 312313 239091
2016-01-13 07:40:55,061 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36349 Total time for transactions(ms): 792 Number of transactions batched in Syncs: 568 Number of syncs: 30886 SyncTimes(ms): 312389 239147
2016-01-13 07:41:55,078 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36356 Total time for transactions(ms): 792 Number of transactions batched in Syncs: 568 Number of syncs: 30893 SyncTimes(ms): 312468 239203
2016-01-13 07:42:55,092 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36362 Total time for transactions(ms): 792 Number of transactions batched in Syncs: 568 Number of syncs: 30899 SyncTimes(ms): 312544 239259
2016-01-13 07:43:56,485 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36370 Total time for transactions(ms): 792 Number of transactions batched in Syncs: 568 Number of syncs: 30907 SyncTimes(ms): 312629 239343
2016-01-13 07:44:56,565 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36376 Total time for transactions(ms): 794 Number of transactions batched in Syncs: 568 Number of syncs: 30913 SyncTimes(ms): 312690 239390
2016-01-13 07:45:42,857 INFO hdfs.StateChange (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: /shared/zefix/company/2014/02/14/company_2014-02-14.avro is closed by DFSClient_NONMAPREDUCE_-1471796874_1
2016-01-13 07:45:56,612 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36384 Total time for transactions(ms): 794 Number of transactions batched in Syncs: 568 Number of syncs: 30921 SyncTimes(ms): 312771 239458
2016-01-13 07:46:05,715 INFO hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate blk_1073888948_148135{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-58bd0cf4-a027-4e4c-aa86-fbf49d8ea6f6:NORMAL:10.10.10.24:50010|RBW], ReplicaUC[[DISK]DS-f93cd04c-c780-44fe-91b7-9fbb214c9724:NORMAL:10.10.10.32:50010|RBW], ReplicaUC[[DISK]DS-5809f6ca-32bb-44d1-8367-44ce5a9b6fff:NORMAL:10.10.10.33:50010|RBW]]} for /shared/zefix/company/2014/09/04/company_2014-09-04.avro
2016-01-13 07:47:54,928 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36394 Total time for transactions(ms): 794 Number of transactions batched in Syncs: 568 Number of syncs: 30929 SyncTimes(ms): 312853 239532
2016-01-13 07:48:54,949 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36401 Total time for transactions(ms): 794 Number of transactions batched in Syncs: 568 Number of syncs: 30936 SyncTimes(ms): 312950 239591
2016-01-13 07:49:54,979 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(699)) - Number of transactions: 36407 Total time for transactions(ms): 794 Number of transactions batched in Syncs: 568 Number of syncs: 30942 SyncTimes(ms): 313009 239649
... View more