About mkataria

mkataria · ‎03-22-2016

Today client is using couple of staging/ftp servers but want to know if there are other practices, all the data is in HDFS.

mkataria · ‎03-22-2016

What are the best practices around copying data between two clusters located in different datacenter on different LAN, the scope is to limit loops.

mkataria · ‎01-27-2016

@Gerd Koenig I had same doubts thanks for confirming, can you share something on putting NM to diff, config groups at your leisure.

mkataria · ‎01-27-2016

@Artem Ervits checked all of those and does not seems to be an issue

mkataria · ‎01-27-2016

One of our clients have asked us to move the prefixed log location to a different mounted point for all the service logs, for example the prefix log location for hdfs is moved from /var/log/hadoop to /hdp/logs/hadoop via api calls. Everything restarted smoothly however only one NM is coming up out of 5, and a manual restart only works on the first NM. All other NM are through the same error, below; STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r ef0582ca14b8177a3cbb6376807545272677d730; compiled by 'jenkins' on 2015-12-16T03:01Z STARTUP_MSG: java = 1.7.0_67 ************************************************************/ 2016-01-26 15:01:25,155 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - registered UNIX signal handlers for [TERM, HUP, INT] 2016-01-26 15:01:26,283 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery 2016-01-26 15:01:26,313 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) 2016-01-26 15:01:26,316 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2016-01-26 15:01:26,317 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(540)) - Error starting NodeManager org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2016-01-26 15:01:26,319 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at bvluxhdpdn05.conocophillips.net/158.139.121.115 ************************************************************/ " As we can see it is not complaining about LOCK file bot present but unavailable, as whichever NM starts first acquire this LOCK (remember this is a single mount point and not local file-system) If I change the log location back to local file-system even for example /tmp/yarnlogs its works smooth since all the NM get access to LOCK file on local file-system where ever they are installed. Has someone faces this issue and can you please suggest a fix to this. Thanks Mayank

mkataria · ‎01-26-2016

Few weeks back we were working with a customer and configured Solr for Ranger, customer decided to skip Solr until it came as a GA feature with Ambari, after uninstalling and removing the Solr properties customer has upgraded Ambari and HDP stack, now the ambari-server logs are flooded with below error, a solution/fix would be helpful. ERROR [qtp-ambari-client-2103] ClusterImpl:2145 - Config inconsistency exists: unknown configType=solr-env Thanks Mayank

mkataria · ‎01-25-2016

@stevel thanks for the explanation upcoming Kerby, we had the same error message which you have already documented however a different cause 'may be unsupported cache type" the error was No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt we commented out "default_ccache_name = KEYRING:persistent:%{uid}" and it fixed. Thanks Mayank

mkataria · ‎01-22-2016

Thanks @Ali Bajwa, posted as article

mkataria · ‎01-22-2016

Hi Everyone, I came across a Kerberos cache issue and wanted to share and possible have more ideas. I understand few of us had some issues in the past and hope this article might help. One of our clients have RedHat IDM (supported version of freeIpa) and when you install sssd along with krb5 by IDM the default cache setting is 'KEYRING' than 'File' You will still be able to get the tickets but will have GSSException error on Cluster. KEYRING persistent cache setting works with many applications however not with HADOOP Cluster, I'm sure HDO engineering team must be looking into this for a solution, since KEYRING is the future for kerberos cache and the stuff you can do with it like keylist and etc. To solve the errors you can comment out the "default_ccache_name=KEYRING....." on krb.conf or change it to "default_ccache_name = FILE:/tmp/krb5cc_%{uid}" Logout and log in again - destroy the previous tickets and you should have something like "Ticket cache: FILE:/tmp/krb5cc_" in your klist output. If you still see KEYRING PERSISTENT, kill all the running sessions of the user having the problem and restart SSSD service. Thanks Mayank

mkataria · ‎01-22-2016

Hi Everyone, I came across a Kerberos cache issue and wanted to share and possible have more ideas. One of our clients have RedHat IDM (supported version of freeIpa) and when you install sssd along with krb5 by IDM the default cache setting is 'KEYRING' than 'File' You will still be able to get the tickets but will have GSSException error on Cluster. KEYRING persistent cache setting works with many applications however not with HADOOP Cluster, I'm sure HDO engineering team must be looking into this for a solution, since KEYRING is the future for kerberos cache and the stuff you can do with it like keylist and etc. To solve the errors you can comment out the "default_ccache_name=KEYRING....." on krb.conf or change it to "default_ccache_name = FILE:/tmp/krb5cc_%{uid}" Logout and log in again - destroy the previous tickets and you should have something like "Ticket cache: FILE:/tmp/krb5cc_" in your klist output. If you still see KEYRING PERSISTENT, kill all the running sessions of the user having the problem and restart SSSD service. Thanks

Online	Offline
Last Visited	‎09-26-2017 04:08 AM

Member Since	‎10-01-2015 02:34 PM
Last Visited	‎09-26-2017 04:08 AM
Posts	52
Kudos received	25

Cloudera Community

Re: Ambari Https (Broken HTTPS)

Re: Working one way trust broken - Kerberos

Re: HDFS Balancer - Access denied for user hdfs-p...

Re: Best practice to copy data from clusters on di...

Best practice to copy data from clusters on differ...

Re: NodeManager fails to start - IO error: lock

Re: NodeManager fails to start - IO error: lock

NodeManager fails to start - IO error: lock

Ambari Config inconsistency issue

Re: Kerberos Cache in IPA /RedHat IDM (KEYRING) SO...

Re: Kerberos Cache in IPA /RedHat IDM (KEYRING) SO...

Kerberos Ticket Error - Cache in IPA /RedHat IDM (...

Kerberos Cache in IPA /RedHat IDM (KEYRING) SOLVED...