Member since
07-09-2020
7
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2132 | 09-28-2020 01:36 AM |
02-09-2021
04:35 AM
Hi @AmirMirza Thank you very much for your response. If I compare with another cluster /atsv2-hbase-secure info, I see: [zk: xxx.xxx.es:2181,yyy.yyy.es:2181,zzz.zzz.es:2181(CONNECTED) 10] ls /atsv2-hbase-secure [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, acl, switch, running, tokenauth, draining, namespace, hbaseid, table] [zk: xxx.xxx.es:2181,yyy.yyy.es:2181,zzz.zzz.es:2181(CONNECTED) 11] getAcl /atsv2-hbase-secure 'sasl,'yarn : cdrwa 'world,'anyone : r 'sasl,'yarn-ats-hbase : cdrwa While on the failing server: [zk: aaa.aaa.es:2181,bbb.bbb.es:2181,ccc.ccc.es:2181(CONNECTED) 0] ls /atsv2-hbase-secure [replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, switch, running, tokenauth, draining, hbaseid, table] [zk: aaa.aaa.es:2181,bbb.bbb.es:2181,ccc.ccc.es:2181(CONNECTED) 1] getAcl /atsv2-hbase-secure 'sasl,'yarn : cdrwa 'world,'anyone : r 'sasl,'yarn-ats-hbase : cdrwa There are several "files" missing: meta-region-server acl namespace Do you think that we can recreate them manually? I'm not use to work with zookeeper and I do not know how to proceed. Anyway, why they are not created automatically? Best regards, Carles
... View more
02-03-2021
02:33 AM
Hello @smdas Thank you again for your quick response. Doing as you indicated, the TimeLineService v2.0 is taking so long in "Starting..." mode. The errors are: 2021-02-03 10:59:30,683 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=15, retries=36, started=128767 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 10:59:50,837 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=16, retries=36, started=148921 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 11:00:10,843 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=17, retries=36, started=168927 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 11:00:30,915 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=18, retries=36, started=188999 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 11:00:51,112 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=19, retries=36, started=209196 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 11:01:11,147 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=20, retries=36, started=229231 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null
2021-02-03 11:01:31,214 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=21, retries=36, started=249298 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at null On the logs side, for the new /var/log/hadoop-yarn/embedded-yarn-ats-hbase directory, there are these messages: 2021-02-03 11:08:04,944 INFO [master/ambarisrv02:17000.splitLogManager..Chore.1] master.SplitLogManager: total=1, unassigned=1, tasks={/atsv2-hbase-secure/splitWAL/WALs%2Fhnode34.pic.es%2C17020%2C1606168410836-splitting%2Fhnode34.pic.es%252C17020%252C1606168410836.1611699600885=last_update = 1612346240946 last_version = 33 cur_worker_name = null status = in_progress incarnation = 2 resubmits = 1 batch = installed = 1 done = 0 error = 0} 2021-02-03 11:08:04,947 INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task /atsv2-hbase-secure/splitWAL/RESCAN0000007157 entered state=DONE ambarisrv02.pic.es,17000,1612346230681 2021-02-03 11:08:05,092 INFO [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=16, retries=22, started=148876 ms ago, cancelled=false, msg=org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server, details=row 'prod.timelineservice.app_flow' on table 'hbase:meta' at null 2021-02-03 11:08:05,138 WARN [master/ambarisrv02:17000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2021-02-03 11:08:05,946 INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task /atsv2-hbase-secure/splitWAL/RESCAN0000007158 entered state=DONE ambarisrv02.pic.es,17000,1612346230681 2021-02-03 11:08:06,138 WARN [master/ambarisrv02:17000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2021-02-03 11:08:06,946 INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task /atsv2-hbase-secure/splitWAL/RESCAN0000007159 entered state=DONE ambarisrv02.pic.es,17000,1612346230681 2021-02-03 11:08:07,139 WARN [master/ambarisrv02:17000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2021-02-03 11:08:07,946 INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task /atsv2-hbase-secure/splitWAL/RESCAN0000007160 entered state=DONE ambarisrv02.pic.es,17000,1612346230681 2021-02-03 11:08:08,139 WARN [master/ambarisrv02:17000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2021-02-03 11:08:08,243 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-1(pid=3087), run time 10mins, 50.255sec 2021-02-03 11:08:08,243 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-2(pid=3089), run time 10mins, 50.255sec 2021-02-03 11:08:08,946 INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task /atsv2-hbase-secure/splitWAL/RESCAN0000007161 entered state=DONE ambarisrv02.pic.es,17000,1612346230681 2021-02-03 11:08:09,140 WARN [master/ambarisrv02:17000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. And the same messages in the timelinereader logs: 2021-02-03 11:09:11,446 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor 2021-02-03 11:09:11,448 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0 at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55) at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:332) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53) ... 9 more Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729) at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325) ... 17 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:164) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:321) ... 1 more I've also to kill the timelinereader start process running on the server that was hung. Recovering the old configuration (using hbase as a system service launch), the timelinereader is started but the problem persists in the logs of course. Thus, in conclusion, after removing the /atv2-hbase-secure configuration on zookeeper, the yarn timelinereader is not able to create the znode of meta-region-server... Thank you again. Cheers, Carles
... View more
02-02-2021
10:29 PM
Hello @smdas Thank you for your response. I can confirm that "is_hbase_system_service_launch" option is checked. Regarding the permission issues, I'm going to check it but nothing has changed in that way in our configuration. I'll continue to investigate. Thank you again. Cheers, Carles
... View more
02-01-2021
04:29 AM
Hello all, We had several times issues with yarn-ats, but most of them were solved by just destroying the app and restarting YARN by ambari. However, last time, we couldn't recover yarn-ats app using this method. The Timelineservice v2.0 had problems connecting to a particular node. 2021-01-27 13:21:36,901 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, retries=7, started=8236 ms ago, cancelled=false, msg=Call to node.es/XX.XX.XX.XX:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: node.es/XX.XX.XX.XX:17020, details=row 'prod.timelineservice.entity,hive!yarn-cluster!HIVE-b41f83ed-5b82-4098-b912-c36feca9049e!����\П!���������!YARN_CONTAINER!����\��!container_e87_1611702208565_0022_01_000001,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node.es,17020,1611702280697, seqNum=-1 Destroying the app, recreating it again, etc. and we found the same issue time after time. We checked the documentation and this Community Forum and we proceeded as indicated here: * Remove ats-yarn and clean zookeeper: https://community.cloudera.com/t5/Support-Questions/ATS-hbase-does-not-seem-to-start/m-p/235162 https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/remove_ats_hbase_before_switching_between_clusters.html However, the problem persists although the errors now are different. It is annoying because it seems that yarn-ats cannot create the /atsv2-hbase-secure/meta-region-server while other files are created in the zookeeper znode. 2021-02-01 12:57:11,446 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor 2021-02-01 12:57:11,448 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0 at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55) at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:332) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53) ... 9 more Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729) at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325) ... 17 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:164) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:321) ... 1 more * Furthermore, changing the zookeper.znode.parent as suggested in another thread does not help either. https://community.cloudera.com/t5/Support-Questions/HDP3-0-timeline-service-V2-reader-cannot-create-zookeeper/td-p/220873/page/2 The new znode directori is created but not the meta-region-server... [zk: XXXX.es:2181,XXXX.es:2181,XXXX.es:2181(CONNECTED) 0] ls /atsv2-hbase-secure
[replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, switch, running, tokenauth, draining, hbaseid, table]
[zk: XXXX:2181,XXXX.es:2181,XXXX.es:2181(CONNECTED) 1] ls /atsv2-hbase-secure-new
[replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, master, switch, running, tokenauth, draining, hbaseid, table] I have seen many other threads with problems with ATSv2 but our new issue seems impossible to solve for us. Why the meta-region-server is not found in the zookeeper.znode.parent? Any idea anyone? We are running HDP-3.1.4 and ambari-2.7.4.0. Thank you very much in advance. Cheers, Carles
... View more
Labels:
11-01-2020
10:26 PM
Hi all, I'm not sure if this issue is considered solved. In case it helps, I explain how we did it. We found the same error after removing several nodes of our kerberized cluster (Ambari 2.7.4 and HDP 3.1.4). $ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase 20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 ats-hbase Failed : HTTP error code : 500 Following this thread, we checked carefully the YARN configuration to ensure that all the variables were correctly scaled to the available nodes. After that, we destroyed the yarn app: $ yarn app -destroy ats-hbase 20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 20/11/02 07:06:14 INFO client.ApiServiceClient: Successfully destroyed service ats-hbase $ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase 20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 Service ats-hbase not found Thus, we restarted all the YARN service on ambari. Now, everything is running fine. $ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase 20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200 {"name":"ats-hbase","id":"application_1604297264331_0001","artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"lifetime":-1,"components":[{"name":"master","dependencies":[],"artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"resource":{"cpus":1,"memory":"4096","additional":{}},"state":"STABLE","configuration":{"properties":{"yarn.service.container-failure.retry.max":"10","yarn.service.framework.path":"/hdp/apps/3.1.4.0-315/yarn/rm2/service-dep.tar.gz"},"env":{"HBASE_LOG_PREFIX":"hbase-$HBASE_IDENT_STRING-master-$HOSTNAME","HBASE_LOGFILE":"$HBASE_LOG_PREFIX.log","HBASE_MASTER_OPTS":"-Xms3276m -Xmx3276m -Djava.security.auth.login.config=/usr/hdp/3.1.4.0-315/hadoop/conf/embedded-yarn-ats-hbase/yarn_hbase_master_jaas.conf", [...]
... View more
09-28-2020
01:36 AM
Hello again, Finally, we decided to remove audit_logs Collection from solr and recreated it again. curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=DELETE&name=audit_logs" curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=CREATE&name=audit_logs&collection.configName=audit_logs&autoAddReplicas=false&nrtReplicas=2&pullReplicas=0&ReplicafionFactor=2&maxShardsPerNode=4&numShards=2" It seems that doing this we solved our error with the document containing the bad input string. java.lang.NumberFormatException: For input string: "t rue" However, there is still the ERROR: ERROR [ ] org.apache.solr.update.processor.DocExpirationUpdateProcessorFactory$DeleteExpiredDocsRunnable (DocExpirationUpdateProcessorFactory.java:431) - Runtime error in periodic deletion of expired docs: null But I understand that this only means that solr has nothing "expired" to remove, right? Thank you very much. Cheers, Carles
... View more
09-23-2020
09:37 AM
Hello all,
We are seeing the next error in our infra-solr server logs (hortonworks HDP-3.1.4.0, infra-solr 0.1.0).
2020-09-23T16:33:00,258 [qtp2039810346-189141] ERROR [c:audit_logs s:shard2 r:core_node10 x:audit_logs_shard2_replica_n9] org.apache.solr.common.SolrException (SolrException.java:148) - org.apache.solr.common.SolrException: ERROR: [doc=697665cf-bc6b-4ada-a94e-e14ede1ba18c] Error adding field 'result'='t rue' msg=For input string: "t rue" at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:218) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:102) at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:967) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:341) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:288) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:235) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:1001) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1222) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:693) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DocExpirationUpdateProcessorFactory$TTLUpdateProcessor.processAdd(DocExpirationUpdateProcessorFactory.java:346) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:327) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:280) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:333) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:235) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:191) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:126) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:123) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NumberFormatException: For input string: "t rue" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.solr.schema.IntPointField.createField(IntPointField.java:149) at org.apache.solr.schema.PointField.createFields(PointField.java:250) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:65) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:169) ... 82 more 2020-09-23T16:33:07,335 [autoExpireDocs-30-thread-1] ERROR [ ] org.apache.solr.update.processor.DocExpirationUpdateProcessorFactory$DeleteExpiredDocsRunnable (DocExpirationUpdateProcessorFactory.java:431) - Runtime error in periodic deletion of expired docs: null java.lang.NullPointerException: null
Anyway, apparently everything is working fine.
We found somewhere how to remove a document in solr, but this doc=697665cf-bc6b-4ada-a94e-e14ede1ba18c is not really presented in audit_logs, since as I understand this is before adding the doc.
Do you know if there is a way to kill and remove this request?
Please let me know if you need further information.
Thank you in advance.
Carles
... View more
Labels: