Member since
09-28-2018
17
Posts
1
Kudos Received
0
Solutions
03-06-2019
07:56 AM
Nodemanager service stopped in all data nodes of a HA cluster. The services restarted and they came up successfully. But we could not find when the service stopped from logs (yarn-yarn-nodemanager-<datanode FQQDN>.out or yarn-yarn-nodemanager-<datanode FQQDN>) on the logs to find when the service was stopped. Is there any way to find when the nodemanager stopped on the nodes. Thanks, Sajesh
... View more
Labels:
01-16-2019
07:14 AM
We have enabled pre-emption in our hadoop cluster and containers are getting pre-empted successfully if queues over utilized. The problem is, jobs are not restarting when cluster become free and need to kill it, then start again manually Is there any settings that helps to restart jobs where containers got preempted ? Hadoop version :HDP-2.5.3.0
... View more
Labels:
01-09-2019
08:52 AM
We have noticed Ambari infra solr service failing regularly after this. The service running in Masternode2 Below are the errors in solr.log. 2019-01-08 06:30:28,057 [coreContainerWorkExecutor-2-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr] ERROR [ ] org.apache.solr.core.CoreContainer$2 (CoreContainer.java:500) - Error waiting for SolrCore to be createdjava.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [audit_logs_shard0_replica1] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:496) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.solr.common.SolrException: Unable to create core [audit_logs_shard0_replica1] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:827) at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458) ... 5 moreCaused by: org.apache.solr.common.SolrException: Could not load conf for core audit_logs_shard0_replica1: Can't load schema managed-schema: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812) ... 8 moreCaused by: org.apache.solr.common.SolrException: Can't load schema managed-schema: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:577) at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:159) at org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:104) at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:173) at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:47) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:70) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79) ... 9 moreCaused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] at org.apache.solr.schema.IndexSchema.loadFields(IndexSchema.java:642) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:473) ... 16 more2019-01-08 06:30:28,072 [recoveryExecutor-16-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr x:ranger_audits_shard1_replica1 s:shard1 c:ranger_audits r:core_node1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.update.UpdateLog$LogReplayer (UpdateLog.java:1308) - Starting log replay tlog{file=/data/ambari_infra_solr/data/ranger_audits_shard1_replica1/data/tlog/tlog.0000000000001289032 refcount=2} active=false starting pos=02019-01-08 06:30:34,982 [commitScheduler-22-thread-1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.core.SolrCore (SolrCore.java:1795) - [ranger_audits_shard1_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=22019-01-08 07:52:30,043 [recoveryExecutor-16-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr x:ranger_audits_shard1_replica1 s:shard1 c:ranger_audits r:core_node1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.update.UpdateLog$LogReplayer (UpdateLog.java:1298) - Log replay finished. recoveryInfo=RecoveryInfo{adds=1 deletes=0 deleteByQuery=0 errors=0 positionOfStart=0}
... View more
01-08-2019
11:29 AM
The Ambari Infra Solr Instance service failing regularly in our hadoop HA cluster. The service will run successfully for some time after starting it. But it fails again later. Below are the solr logs. 2019-01-08 06:30:28,057 [coreContainerWorkExecutor-2-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr] ERROR [ ] org.apache.solr.core.CoreContainer$2 (CoreContainer.java:500) - Error waiting for SolrCore to be created
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [audit_logs_shard0_replica1]
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:496)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Unable to create core [audit_logs_shard0_replica1]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:827)
at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
... 5 more
Caused by: org.apache.solr.common.SolrException: Could not load conf for core audit_logs_shard0_replica1: Can't load schema managed-schema: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]]
at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
... 8 more
Caused by: org.apache.solr.common.SolrException: Can't load schema managed-schema: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]]
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:577)
at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:159)
at org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:104)
at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:173)
at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:47)
at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:70)
at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
... 9 more
Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for '2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed' [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]] and [[[2018-07-03 23:00:00,684 INFO FSNamesystem.audit : allowed{type=boolean,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast}]]]
at org.apache.solr.schema.IndexSchema.loadFields(IndexSchema.java:642)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:473)
... 16 more
2019-01-08 06:30:28,072 [recoveryExecutor-16-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr x:ranger_audits_shard1_replica1 s:shard1 c:ranger_audits r:core_node1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.update.UpdateLog$LogReplayer (UpdateLog.java:1308) - Starting log replay tlog{file=/data/ambari_infra_solr/data/ranger_audits_shard1_replica1/data/tlog/tlog.0000000000001289032 refcount=2} active=false starting pos=0
2019-01-08 06:30:34,982 [commitScheduler-22-thread-1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.core.SolrCore (SolrCore.java:1795) - [ranger_audits_shard1_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2019-01-08 07:52:30,043 [recoveryExecutor-16-thread-1-processing-n:prdhdpmn2.na.ad.example.com:8886_solr x:ranger_audits_shard1_replica1 s:shard1 c:ranger_audits r:core_node1] WARN [c:ranger_audits s:shard1 r:core_node1 x:ranger_audits_shard1_replica1] org.apache.solr.update.UpdateLog$LogReplayer (UpdateLog.java:1298) - Log replay finished. recoveryInfo=RecoveryInfo{adds=1 deletes=0 deleteByQuery=0 errors=0 positionOfStart=0}
The infra solr service running on Masternode 2. Looking for your help to fix this issue. Also, what is the impact if Inforsolr instance service get stopped as we don't see any issues to running jobs.
... View more
Labels:
12-19-2018
01:54 PM
We did not find any logs about connectivity issue between Name-node and Journal-Node. Every thing fine after restarting the name node services. Where do I find heap usage by journal-node during this time period?
... View more
12-19-2018
12:09 PM
We are running Hadoop cluster in HA environment. The name node service got failed in active and standby servers. As per the logs, its failed due to QJM waited for quorum and did not get response then timed out. Logs from Active Node: 2018-12-17 20:57:01,587 WARN client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 19014 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2018-12-17 20:57:02,574 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.106.8.78:8485, 10.106.8.145:8485, 10.106.8.161:8485], stream=QuorumOutputStream starting at txid 62507384))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:654)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4018)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
2018-12-17 20:57:02,574 WARN client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 62507384
2018-12-17 20:57:02,587 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(501)) - ==> JVMShutdownHook.run()
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(502)) - JVMShutdownHook: Signalling async audit cleanup to start.
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(477)) - RangerAsyncAuditCleanup: Starting cleanup
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(506)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.
2018-12-17 20:57:02,593 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2018-12-17 20:57:02,593 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=838857, localBatchBuffer.size()=0
2018-12-17 20:57:02,605 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=837857, localBatchBuffer.size(
2018-12-17 20:57:12,571 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=12859, localBatchBuffer.size()=0
2018-12-17 20:57:12,583 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=11859, localBatchBuffer.size()=0
2018-12-17 20:57:12,592 WARN util.ShutdownHookManager (ShutdownHookManager.java:run(70)) - ShutdownHook 'JVMShutdownHook' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(514)) - JVMShutdownHook: Interrupted while waiting for completion of Async executor!
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
at org.apache.ranger.audit.provider.AuditProviderFactory$JVMShutdownHook.run(AuditProviderFactory.java:507)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(516)) - JVMShutdownHook: Interrupting ranger async audit cleanup thread
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(518)) - <== JVMShutdownHook.run()
2018-12-17 20:57:12,592 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(106)) - Stop called. name=hdfs.async
2018-12-17 20:57:12,593 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(110)) - Interrupting consumerThread. name=hdfs.async, consumer=hdfs.async.multi_dest
2018-12-17 20:57:12,593 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(481)) - RangerAsyncAuditCleanup: Done cleanup
2018-12-17 20:57:12,593 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(470)) - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
2018-12-17 20:57:12,593 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(155)) - Caught exception in consumer thread. Shutdown might be in progress
2018-12-17 20:57:12,596 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(171)) - Exiting polling loop. name=hdfs.async
2018-12-17 20:57:12,596 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(175)) - Calling to stop consumer. name=hdfs.async, consumer.name=hdfs.async.multi_dest
2018-12-17 20:57:12,596 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
Logs from StandbyNode: 2018-12-17 20:55:17,191 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for yarn/prdhdpmn1.example.com@example.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2018-12-17 20:55:17,742 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch.solr, interval=01:00.001 minutes, events=3, succcessCount=3, totalEvents=41545284, totalSuccessCount=35647876, totalFailedCount=5897408
2018-12-17 20:55:17,743 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:55:17,780 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=01:00.184 minutes, events=1000, deferredCount=1000, totalEvents=11155910, totalSuccessCount=58910, totalDeferredCount=11097000
2018-12-17 20:55:17,780 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:createConfiguration(310)) - Returning HDFS Filesystem Config: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml
2018-12-17 20:55:17,794 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:getLogFileStream(271)) - Checking whether log file exists. hdfPath=hdfs://prdhdpmn1.example.com:8020/ranger/audit/hdfs/20181217/hdfs_ranger_audit_prdhdpmn2.example.com.log, UGI=nn/prdhdpmn2.example.com@example.COM (auth:KERBEROS)
2018-12-17 20:55:17,803 WARN retry.RetryInvocationHandler (RetryInvocationHandler.java:handleException(217)) - Exception while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1979)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1345)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3967)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1130)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy33.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816)
at sun.reflect.GeneratedMethodAccessor271.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1443)
at org.apache.ranger.audit.destination.HDFSAuditDestination.getLogFileStream(HDFSAuditDestination.java:273)
at org.apache.ranger.audit.destination.HDFSAuditDestination.access$000(HDFSAuditDestination.java:44)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:159)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:156)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.ranger.audit.destination.HDFSAuditDestination.logJSON(HDFSAuditDestination.java:170)
at org.apache.ranger.audit.queue.AuditFileSpool.sendEvent(AuditFileSpool.java:880)
at org.apache.ranger.audit.queue.AuditFileSpool.runLogAudit(AuditFileSpool.java:819)
at org.apache.ranger.audit.queue.AuditFileSpool.run(AuditFileSpool.java:758)
at java.lang.Thread.run(Thread.java:745)
2018-12-17 20:55:17,803 ERROR provider.BaseAuditHandler (BaseAuditHandler.java:logError(329)) - Error writing to log file.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1979)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1345)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3967)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1130)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy33.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816)
at sun.reflect.GeneratedMethodAccessor271.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1443)
at org.apache.ranger.audit.destination.HDFSAuditDestination.getLogFileStream(HDFSAuditDestination.java:273)
at org.apache.ranger.audit.destination.HDFSAuditDestination.access$000(HDFSAuditDestination.java:44)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:159)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:156)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.ranger.audit.destination.HDFSAuditDestination.logJSON(HDFSAuditDestination.java:170)
at org.apache.ranger.audit.queue.AuditFileSpool.sendEvent(AuditFileSpool.java:880)
at org.apache.ranger.audit.queue.AuditFileSpool.runLogAudit(AuditFileSpool.java:819)
at org.apache.ranger.audit.queue.AuditFileSpool.run(AuditFileSpool.java:758)
at java.lang.Thread.run(Thread.java:745)
2018-12-17 20:55:17,803 ERROR queue.AuditFileSpool (AuditFileSpool.java:logError(710)) - Error sending logs to consumer. provider=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs
2018-12-17 20:55:17,804 INFO queue.AuditFileSpool (AuditFileSpool.java:runLogAudit(770)) - Destination is down. sleeping for 30000 milli seconds. indexQueue=2, queueName=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs
2018-12-17 20:55:20,744 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:55:23,744 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:26,779 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:29,779 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:30,179 ERROR client.RangerAdminRESTClient (RangerAdminRESTClient.java:getServicePoliciesIfUpdated(124)) - Error getting policies. secureMode=true, user=nn/prdhdpmn2.example.com@example.COM (auth:KERBEROS), response={"httpStatusCode":401,"statusCode":0}, serviceName=ABC_hadoop
2018-12-17 20:56:30,179 ERROR util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(255)) - PolicyRefresher(serviceName=ABC_hadoop): failed to refresh policies. Will continue to use last known version of policies (222)
java.lang.Exception: HTTP 401
at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:126)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:232)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicy(PolicyRefresher.java:188)
at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:158)
2018-12-17 20:56:32,780 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:35,780 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:38,780 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:38,901 INFO ipc.Server (Server.java:saslProcess(1538)) - Auth successful for jhs/prdhdpmn1.example.com@example.COM (auth:KERBEROS)
2018-12-17 20:56:38,902 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for jhs/prdhdpmn1.example.com@example.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2018-12-17 20:56:41,780 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:42,500 INFO ipc.Server (Server.java:saslProcess(1538)) - Auth successful for ambari-qa-ABC@example.COM (auth:KERBEROS)
2018-12-17 20:56:42,501 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for ambari-qa-ABC@example.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2018-12-17 20:56:42,574 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(716)) - Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 2 5
2018-12-17 20:56:43,687 INFO ipc.Server (Server.java:saslProcess(1538)) - Auth successful for user1@example.COM (auth:KERBEROS)
2018-12-17 20:56:43,687 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for user1@example.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2018-12-17 20:56:44,781 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:47,781 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1588)) - BLOCK* neededReplications = 0, pendingReplications = 0.
2018-12-17 20:56:48,574 INFO client.QuorumJournalManager (QuorumCall.java:waitFor(136)) - Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2018-12-17 20:56:49,576 INFO client.QuorumJournalManager (QuorumCall.java:waitFor(136)) - Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2018-12-17 20:57:00,587 WARN client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 18013 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2018-12-17 20:57:01,587 WARN client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 19014 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2018-12-17 20:57:02,574 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<JN1_IP>:8485, <JN2_IP>:8485, <JN2_IP>:8485], stream=QuorumOutputStream starting at txid 62507384))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:654)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4018)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
2018-12-17 20:57:02,574 WARN client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 62507384
2018-12-17 20:57:02,587 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(501)) - ==> JVMShutdownHook.run()
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(502)) - JVMShutdownHook: Signalling async audit cleanup to start.
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(477)) - RangerAsyncAuditCleanup: Starting cleanup
2018-12-17 20:57:02,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(506)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.
2018-12-17 20:57:02,593 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2018-12-17 20:57:02,593 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=838857, localBatchBuffer.size()=0
2018-12-17 20:57:02,605 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=837857, localBatchBuffer.size()=0
2018-12-17 20:57:12,583 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(332)) - Queue is not empty. Will retry. queue.size)=11859, localBatchBuffer.size()=0
2018-12-17 20:57:12,592 WARN util.ShutdownHookManager (ShutdownHookManager.java:run(70)) - ShutdownHook 'JVMShutdownHook' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(514)) - JVMShutdownHook: Interrupted while waiting for completion of Async executor!
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
at org.apache.ranger.audit.provider.AuditProviderFactory$JVMShutdownHook.run(AuditProviderFactory.java:507)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(516)) - JVMShutdownHook: Interrupting ranger async audit cleanup thread
2018-12-17 20:57:12,592 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(518)) - <== JVMShutdownHook.run()
2018-12-17 20:57:12,592 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(106)) - Stop called. name=hdfs.async
2018-12-17 20:57:12,593 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(110)) - Interrupting consumerThread. name=hdfs.async, consumer=hdfs.async.multi_dest
2018-12-17 20:57:12,593 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(481)) - RangerAsyncAuditCleanup: Done cleanup
2018-12-17 20:57:12,593 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(470)) - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
2018-12-17 20:57:12,593 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(155)) - Caught exception in consumer thread. Shutdown might be in progress
2018-12-17 20:57:12,596 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(171)) - Exiting polling loop. name=hdfs.async
2018-12-17 20:57:12,596 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(175)) - Calling to stop consumer. name=hdfs.async, consumer.name=hdfs.async.multi_dest
2018-12-17 20:57:12,596 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
There were no error in journal node logs. We could not identify why the QJM failed to get the repose from the journal nodes which caused this failure. Is there any way to find the root case of this failure ?
... View more
Labels:
12-11-2018
06:26 AM
I had followed the above steps. After login to new user account i can go to Encryption tab.But when I select my Service name from "Select service" option it says "User:<user> not allowed to do 'GET_KEYS" and i cannot see any of my keys listed
... View more
12-10-2018
07:40 AM
1 Kudo
We are managing the Ranger KMS using kayadmin user. I can add users and assign admin role from ranger admin console. But could not find user management option after login to keyadmin user profile. How can I create new users and add them as keyadmin for managing keys ? Thanks, Sajesh
... View more
Labels:
12-10-2018
07:11 AM
Hi @PRAVIN BHAGADE I have dropped the collection and restarted LogSearch. The issues is fixed now. Thanks for your help.
... View more
12-06-2018
03:00 PM
We are using Amabri Version2.4.0.1 to manage Hadoop HA cluster.and noticed Ambari infra solr was stopped for few days. After successfully starting the service again, below errors reporting in solr.log continuously. [qtp1769597131-103] ERROR [c:hadoop_logs s:shard0 r:core_node2 x:hadoop_logs_shard0_replica1] org.apache.solr.common.SolrException (SolrException.java:148) - org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: hadoop_logs slice: shard2
at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:626)
at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:612)
at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:367)
at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:315)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:671)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.DocExpirationUpdateProcessorFactory$TTLUpdateProcessor.processAdd(DocExpirationUpdateProcessorFactory.java:347)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274)
at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157)
at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:94)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745) How do we fix this issue.
... View more
Labels:
12-01-2018
07:19 AM
I tried configuring authentication.ldap.secondaryUrl and did ldap sync --all. Ambari Server 'sync-ldap' completed successfully. But i could not find any users/groups from secondary domain and there were no logs about the sync (like failed/success/ if any errors) Any way we can validate LDAP sync ?
... View more
12-01-2018
07:08 AM
We are using ambari 2.4.0.1 and looks like db-purge-history is not available in this version.
... View more
11-27-2018
02:56 PM
We have Ambari LDAP configured with one of the sub domain and all users from that domain is available in ambari after the ldap sync. Now users from another sub domain need access to Ambari console. How do we enable the LDAP with other sub domains. Below are the current LDAP configuration in /etc/ambari-server/conf/ambari.properties ------------------------ ambari.ldap.isConfigured=true authentication.ldap.baseDn=DC=sub1,DC=ad,DC=abc,DC=com authentication.ldap.bindAnonymously=false authentication.ldap.dnAttribute=dn authentication.ldap.groupMembershipAttr=member authentication.ldap.groupNamingAttr=cn authentication.ldap.groupObjectClass=group authentication.ldap.managerDn=Hadoop-AD-Admin-devl@sub1.ad.abc.com
authentication.ldap.managerPassword=${alias=ambari.ldap.manager.password}
authentication.ldap.primaryUrl=sub1.ad.abc.com:389 authentication.ldap.referral=follow authentication.ldap.useSSL=false authentication.ldap.userObjectClass=user authentication.ldap.usernameAttribute=sAMAccountName client.security=ldap ------------------------ All users from "sub1.ad.abc.com" are available in Amabri. Need access for users from "sub2.ad.abc.com" to Amabri Ambari server Version: 2.4.0.1
... View more
Labels:
11-22-2018
06:54 AM
We are running Ambari server with Mysql to manage Hadoop HA cluster. Issue is: /var/ file system utilization keep increasing and we have noticed there too many mysqld-bin.XXXX files in /var/lib/mysql/ directory with 1.1 G size and the number of files keep increasing day-by-day. Since it is clustered the /var/lib/mysql/ size is same in both Active and Standby nodes and causing storage issue in both node. What are these files contains ? Can we purge these files. ? Please suggest, how do we address this issue.
... View more
Labels:
11-05-2018
11:37 AM
We are running hadoop HA cluster using AWS EC2 instances with 17 Data ndoes (All instances are M4.4xlarge including name nodes). All the DN's are configured with 16TB (EBS st1) volumes for hdfs. Now we are running out of HDFS storage and looking to extend the storage. Since 16TB is max limit for st1 EBS we cannot extend the existing volume. Trying to add additional 16TB volumes to few data nodes and update "DataNode directories" in ambari with this new volume path. Will this approach impact any performance issue with cluster ? Any other things need be considered in this approach ?
... View more
Labels:
10-31-2018
11:05 AM
Will ambari-server service restart impact any other hadoop services currently running . I have hadoop HA cluster managed by Amabri 2.4.0.1.
... View more
Labels: