Member since
12-14-2015
12
Posts
11
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7317 | 03-02-2016 04:01 PM | |
8728 | 02-26-2016 01:22 PM |
05-30-2017
06:45 PM
Hi Matt, I tried exactly what you suggested while I was waiting for your reply. I was able to access the UI without the data flow running. I looked at the System Diagnostics you mentioned earlier. It was "4 times" without data flow running and still "8 times" now after I started data flows. Our cluster has 2-node and each has 16 cores. The "max timer driven thread count" is set to 64 and "max event driven thread count" is set to 12. I've been monitoring "top", cpu usage at this time (busy hour) is about 700%. Good news is after I started nifi in stopped state and manually restarted the data flows, the problem I had this morning has not recurred yet. Heartbeats are generated in reasonable intervals - about 7 seconds. What happened this morning is still a mystry to me, but I am happy now it's working. Thank you so much for all the help!!! Xi
... View more
05-30-2017
04:13 PM
Hi Matt, We are using jdk1.8.0_31 and nifi.version=1.0.0.2.0.1.0-12. The following is the first few lines of the output of jstat: [root@be-bi-nifi-441 conf]# /usr/java/default/bin/jstat -gcutil 3248 250 1000
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 0.00 100.00 36.84 33.25 95.26 89.80 5545 1731.464 4 3.563 1735.027 0.00 100.00 77.89 33.25 95.26 89.80 5545 1731.464 4 3.563 1735.027 0.00 100.00 93.68 33.25 95.26 89.80 5546 1731.464 4 3.563 1735.027 0.00 100.00 93.68 33.25 95.26 89.80 5546 1731.464 4 3.563 1735.027 0.00 100.00 26.32 33.90 95.26 89.80 5546 1731.930 4 3.563 1735.492 0.00 100.00 64.21 33.90 95.26 89.80 5546 1731.930 4 3.563 1735.492 0.00 100.00 93.68 33.90 95.26 89.80 5547 1731.930 4 3.563 1735.492 0.00 100.00 93.68 33.90 95.26 89.80 5547 1731.930 4 3.563 1735.492
Looks like NiFi is busy with GC just like you suspected, but I do not understand why. Can you please give me some advices on how to debug this without UI access? Thank you very much! Xi
... View more
05-30-2017
03:15 PM
Hi Matt, Thank you so much for getting back to me so quickly. I cannot access NiFi UI due to the nodes connected, then disconnected so quickly from the cluster. I do see the following logs in nifi node log about every minute: 2017-05-30 11:01:16,777 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5af47414 checkpointed with 4 Records and 0 Swap Files in 10 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 114 Is this normal or indication of a problem? The cluster was fine yesterday when I checked and nothing is changed - I am the only person can make changes, so I know for sure. Thanks again! Xi
... View more
05-30-2017
02:46 PM
Hi, We have a 2-node nifi production cluster with HDF-2.0.1.0 release. It works fine for over a year now. This morning both nodes are connecting, connected, then disconnected from cluster due to lack of heartbeat. nifi.cluster.protocol.heartbeat.interval in nifi.properties is the default 5 sec. From nifi node log, I do not see the heartbeats are created every 5 seconds - in my working dev cluster they are created roughly every 5 seconds. In this production cluster the heartbeats are created every 1 or 2 minutes. 2017-05-30 10:30:07,838 INFO [Clustering Tasks Thread-1] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-30 10:30:07,653 and sent to be-bi-nifi-441.soleocommunications.com:8085 at 2017-05-30 10:30:07,838; send took 184 millis
2017-05-30 10:31:14,986 INFO [Clustering Tasks Thread-1] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-30 10:31:14,515 and sent to be-bi-nifi-441.soleocommunications.com:8085 at 2017-05-30 10:31:14,986; send took 471 millis
2017-05-30 10:33:44,971 INFO [Clustering Tasks Thread-2] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-30 10:33:44,404 and sent to be-bi-nifi-441.soleocommunications.com:8085 at 2017-05-30 10:33:44,971; send took 566 millis
2017-05-30 10:34:15,280 INFO [Clustering Tasks Thread-3] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-30 10:34:15,122 and sent to be-bi-nifi-441.soleocommunications.com:8085 at 2017-05-30 10:34:15,280; send took 157 millis
2017-05-30 10:36:21,204 INFO [Clustering Tasks Thread-3] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-30 10:36:20,673 and sent to be-bi-nifi-441.soleocommunications.com:8085 at 2017-05-30 10:36:21,204; send took 530 millis This cluster worked fine yesterday and nothing changed on the system. Can anyone give me some insight why the heartbeats are not created as configured? Thank you very much in advance! Xi Sanderson
... View more
Labels:
- Labels:
-
Apache NiFi
03-02-2016
04:01 PM
1 Kudo
Problem solved by changing the ulimit on both service accounts and user accounts. 32k for files, 64k for processes worked for me.
... View more
02-29-2016
02:39 PM
2 Kudos
Hi,
We have some queries that work fine with small set of data, but when I am pulling a month worth of data, I got the following error: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "be-bi-secondary-528.soleocommunications.com/10.10.11.6"; destination host is: "be-bi-secondary-528.soleocommunications.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) at org.apache.hadoop.ipc.Client.call(Client.java:1431) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy16.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy17.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3008) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2978) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1043) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:226) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1655) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:791) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493) at org.apache.hadoop.ipc.Client.call(Client.java:1397) ... 39 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:713) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:784) ... 42 more Error launching map-reduce job These queries used to work with large data set before. I start seeing this problem after I upgraded HDP from 2.2.4.2 to 2.3.2.
I tried few things people suggested online, such as increase ulimit (from 1024 to 64000), increase map/reduce java.opts (in my hive session before running the job, from system setting -Xmx2867m to -Xmx10240m), they didn't help. I also saw people talking about turning max data transfer threads, my system is already set to a pretty high value suggested by SmartSense. Any help will be greatly appreciated! Xi
... View more
Labels:
- Labels:
-
Apache Hive
02-26-2016
01:22 PM
1 Kudo
Hi all, I opened a support ticket and got answer back regarding metastore alerts. It is a known bug in the Ambari release I have (2.1.2): https://issues.apache.org/jira/browse/AMBARI-14424 The suggested solution is to change script: /var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py search for 30 and replace with 120, then restart Ambari server. Still yet to monitor how the changes work. Thank for all the helps from you guys! Xi
... View more
02-25-2016
04:25 PM
1 Kudo
Hi, Yes, we are using SmartSense. I will open a support ticket too. Here is one of the alerts: Services Reporting Alerts
OK
[HIVE]
CRITICAL
[HIVE]
HIVE
OK
Hive Metastore Process
Metastore OK - Hive command
took 9.718s
CRITICAL
Hive Metastore Process
Metastore on
be-bi-secondary-528.soleocommunications.com failed (Execution of
'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export
PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/sbin/:/usr/hdp/current/hive-metastore/bin'"'"'
; export
HIVE_CONF_DIR='"'"'/usr/hdp/current/hive-metastore/conf/conf.server'"'"'
; hive --hiveconf hive.metastore.uris=thrift://be-bi-secondary-528.soleocommunications.com:9083
--hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf
hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1
--hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr
-e '"'"'show databases;'"'"''' was killed due timeout
after 30 seconds)
This
notification was sent to Ambari Alert From TheOracle
Apache Ambari 2.1.2 Thanks, Xi
... View more
02-25-2016
03:49 PM
1 Kudo
Hi Artem, I implemented the suggestion in the thread Neeraj referred, but still have the issue. On light days, I get 5, 6; on heavy days, still over 10. I am also getting a lot of Hive Metastore check alerts (... '"'"'show
databases;'"'"''' was killed due
timeout after 30 seconds) with OK and Critical in the same email. Last night I got hundreds of those. It has to do with the load on the cluster. Any help is appreciated! Xi
... View more
01-20-2016
07:19 PM
1 Kudo
Hi Neeraj, Thank you very much for the link. I will give it a try. Xi
... View more