About mph

mph · ‎05-15-2016

Hi All, Recently I setup email notifications in Ambari (2.2.1.0) to receive notifications when alert states change. I've found that throughout each day the YARN alert App Timeline Web UI: Connection Failed switches from CRITICAL (connection timed out on port 8188) to OK. This happens 12 / 14 times a day. I'm unsure why this is? I have a basic 2 node cluster, the App Timeline and History Server components are on node 1, and the Resource Manager is on node 1 if that helps. Any thoughts on why this might happen - it doesn't seem to effect performance. Mike

mph · ‎05-05-2016

This was it. I had put some hive serde UDF jar files into the classpath with dependencies that caused a mismatch. Thanks for pointing me in the right direction.

mph · ‎05-05-2016

HDP 2.4.0.0-169 OS: Ubuntu 14.04 The only change I have recently made is in upgrading Java from 1.7 to 1.8.0_91

mph · ‎05-05-2016

..also when I run Hive on MapReduce it works fine.

mph · ‎05-05-2016

Googled this line : INFO : Dag submit failed due to org.apache.hadoop.fs.FSOutputSummer But couldn't find anything.

mph · ‎05-05-2016

INFO : Tez session hasn't been created yet. Opening sessionINFO : Dag name: insert into default.test(node...VALUES('dd')(Stage-1)INFO : Dag submit failed due to org.apache.hadoop.fs.FSOutputSummer.<init>(Ljava/util/zip/Checksum;II)Vat org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1340)at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1369)at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1401)at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1382)at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1307)at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:384)at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:380)at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:380)at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:324)at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:852)at org.apache.tez.dag.history.recovery.RecoveryService.handleSummaryEvent(RecoveryService.java:393)at org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:310)at org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:104)at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2204)at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1225)at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)stack trace: [org.apache.hadoop.ipc.Client.call(Client.java:1427), org.apache.hadoop.ipc.Client.call(Client.java:1358), org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229), com.sun.proxy.$Proxy43.submitDAG(Unknown Source), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:517), org.apache.tez.client.TezClient.submitDAG(TezClient.java:434), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1720), org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1477), org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1254), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113), org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154), org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71), org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206), java.security.AccessController.doPrivileged(Native Method), javax.security.auth.Subject.doAs(Subject.java:422), org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657), org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617), java.lang.Thread.run(Thread.java:745)] retrying...ERROR : Failed to execute tez graph.org.apache.hadoop.ipc.RemoteException(java.lang.NoSuchMethodError): org.apache.hadoop.fs.FSOutputSummer.<init>(Ljava/util/zip/Checksum;II)V ...any ideas on how to debug - the logs tell me nothing. Mike

mph · ‎04-18-2016

I think this was the issue - ambari has auto-configured only one vCore. When I increased this it seemed to solve the problem.

mph · ‎04-14-2016

zeppelin Zeppelin SPARK default Wed Apr 13 17:41:12 +0100 2016 N/A RUNNING UNDEFINED It says that its still running and is using 66% of the queue/clsuter memory.

mph · ‎04-14-2016

I can run the same job from the pyspark shell with no problems, it executes immediately.

mph · ‎04-13-2016

hi, When I run a spark job through Zeppelin I get the following output and the job just hangs and never returns. Does anyone have any idea how I could debug and address this problem? I'm running spark 1.6 and HDP 2.4. Thanks, Mike INFO [2016-04-13 18:01:17,746] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4 stored as values in memory (estimated size 305.3 KB, free 983.8 KB) INFO [2016-04-13 18:01:17,860] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4_piece0 stored as bytes in memory (estimated size 25.9 KB, free 1009.7 KB) INFO [2016-04-13 18:01:17,876] ({dispatcher-event-loop-0} Logging.scala[logInfo]:58) - Added broadcast_4_piece0 in memory on 148.88.72.84:56438 (size: 25.9 KB, free: 511.0 MB) INFO [2016-04-13 18:01:17,893] ({Thread-65} Logging.scala[logInfo]:58) - Created broadcast 4 from textFile at NativeMethodAccessorImpl.java:-2 INFO [2016-04-13 18:01:18,162] ({Thread-65} FileInputFormat.java[listStatus]:249) - Total input paths to process : 1 INFO [2016-04-13 18:01:18,279] ({Thread-65} Logging.scala[logInfo]:58) - Starting job: count at <string>:3 INFO [2016-04-13 18:01:18,317] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Got job 2 (count at <string>:3) with 2 output partitions INFO [2016-04-13 18:01:18,321] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Final stage: ResultStage 2 (count at <string>:3) INFO [2016-04-13 18:01:18,322] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Parents of final stage: List() INFO [2016-04-13 18:01:18,325] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Missing parents: List() INFO [2016-04-13 18:01:18,333] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting ResultStage 2 (PythonRDD[8] at count at <string>:3), which has no missing parents INFO [2016-04-13 18:01:18,366] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5 stored as values in memory (estimated size 6.2 KB, free 1015.9 KB) INFO [2016-04-13 18:01:18,406] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.7 KB, free 1019.6 KB) INFO [2016-04-13 18:01:18,407] ({dispatcher-event-loop-1} Logging.scala[logInfo]:58) - Added broadcast_5_piece0 in memory on 148.88.72.84:56438 (size: 3.7 KB, free: 511.0 MB) INFO [2016-04-13 18:01:18,410] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Created broadcast 5 from broadcast at DAGScheduler.scala:1006 INFO [2016-04-13 18:01:18,416] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting 2 missing tasks from ResultStage 2 (PythonRDD[8] at count at <string>:3) INFO [2016-04-13 18:01:18,417] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Adding task set 2.0 with 2 tasks INFO [2016-04-13 18:01:18,428] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Added task set TaskSet_2 tasks to pool default WARN [2016-04-13 18:01:23,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources WARN [2016-04-13 18:01:38,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Online	Offline
Last Visited	‎07-16-2020 05:48 AM

Member Since	‎04-13-2016 05:05 PM
Last Visited	‎07-16-2020 05:48 AM
Posts	80
Kudos received	12

Cloudera Community

Re: Zeppelin error on restart

Why am I seeing this odd behaviour? YARN / App Tim...

Re: Hive running on TEZ INSERT INTO fails with the...

Re: Hive running on TEZ INSERT INTO fails with the...

Re: Hive running on TEZ INSERT INTO fails with the...

Re: Hive running on TEZ INSERT INTO fails with the...

Hive running on TEZ INSERT INTO fails with the fol...

Re: Spark Job hangs when run on zeppelin

Re: Spark Job hangs when run on zeppelin

Re: Spark Job hangs when run on zeppelin

Spark Job hangs when run on zeppelin