Support Questions
Find answers, ask questions, and share your expertise

Dr-elephant is not collecting the data

Dr-elephant is not collecting the data

Contributor

Hi Team,

We have restarted the yarn and we restarted the dr.elephant also, but still we are getting the error.

02-14-2017 10:24:08 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : The list of RM IDs are rm1,rm2 02-14-2017 10:24:08 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Checking RM URL: http://ylpd269.kmdc.att.com:8088/ws/v1/cluster/info 02-14-2017 10:24:08 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : ylpd269.kmdc.att.com:8088 is ACTIVE 02-14-2017 10:24:08 INFO com.linkedin.drelephant.ElephantRunner : Fetching analytic job list... 02-14-2017 10:24:08 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Fetching recent finished application runs between last time: 1487085728997, and current time: 1487085788998 02-14-2017 10:24:08 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : The succeeded apps URL is http://ylpd269.kmdc.att.com:8088/ws/v1/cluster/apps?finalStatus=SUCCEEDED&finishedTimeBegin=14870857... 02-14-2017 10:24:09 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1486843207585_79341 02-14-2017 10:24:09 INFO com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : The failed apps URL is http://ylpd269.kmdc.att.com:8088/ws/v1/cluster/apps?finalStatus=FAILED&finishedTimeBegin=14870857289... 02-14-2017 10:24:09 INFO com.linkedin.drelephant.ElephantRunner : Job queue size is 4432 02-14-2017 10:24:09 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 3 analyzing MAPREDUCE application_1486843207585_79340 02-14-2017 10:24:11 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1486843207585_79343 02-14-2017 10:24:12 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1486843207585_79344 02-14-2017 10:24:13 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing MAPREDUCE application_1486843207585_79384 02-14-2017 10:24:14 INFO com.linkedin.drelephant.ElephantRunner : Executor thread 2 analyzing SPARK application_1486843207585_79387 02-14-2017 10:24:14 ERROR com.linkedin.drelephant.ElephantRunner : 02-14-2017 10:24:14 ERROR com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.net.ConnectException: Connection refused at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:99) at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:99) at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:48) at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:232) at com.linkedin.drelephant.ElephantRunner$ExecutorThread.run(ElephantRunner.java:151) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:686) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:638) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:711) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:559) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:588) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1436) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:312) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:524) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:545) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:801) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:709) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:559) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:588) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:948) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:963) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) at org.apache.spark.deploy.history.SparkFSFetcher.org$apache$spark$deploy$history$SparkFSFetcher$$isLegacyLogDirectory(SparkFSFetcher.scala:186) at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:143) at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:99) ... 13 more

02-14-2017 10:24:14 ERROR com.linkedin.drelephant.ElephantRunner : Add analytic job id [application_1486843207585_79387] into the retry list.

Please help me on this. Awaiting for your reply.

Thanks & Regards Shyam Gurram