Member since
05-30-2017
5
Posts
1
Kudos Received
0
Solutions
06-01-2017
09:05 PM
1 Kudo
And the solution appears to be we had some files .../hadoop/yarn/local/usercache/hive/filecache with odd permissions. Flushing that directory on the data nodes and the query runs.
... View more
06-01-2017
08:36 PM
Can you elaborate on the missing permissions? I am running into a similar issue. Thanks.
... View more
05-31-2017
02:34 PM
Both reports are with SSL enabled=false. Set in the Custom tez-site section. Sometimes the error reports differently: --------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 9 9 0 0 7 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 4 0
Reducer 3 RUNNING 1 0 1 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/03 [=======================>>---] 90% ELAPSED TIME: 35.85 s
--------------------------------------------------------------------------------
Status: Failed
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01, diagnostics=[Vertex vertex_1496156002153_0008_4_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE, Vertex vertex_1496156002153_0008_4_01 [Reducer 2] failed as task task_1496156002153_0008_4_01_000000 failed after vertex succeeded.]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_4_00Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01Vertex failed, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_4_01, diagnostics=[Vertex vertex_1496156002153_0008_4_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE, Vertex vertex_1496156002153_0008_4_01 [Reducer 2] failed as task task_1496156002153_0008_4_01_000000 failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
... View more
05-31-2017
02:29 PM
That did not change the issue. It is inconsistent. I ran this 3 times and it failed 2 of the three times. The current failure came out on the console as: hive> select profile_id,count(*) cnt from webordersbase group by profile_id order by cnt desc limit 100 ;
Query ID = my_user_20170531102337_ddee6338-5e67-496c-b2af-3e380288396e
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1496156002153_0008)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 9 9 0 0 5 0
Reducer 2 RUNNING 1 0 0 1 4 0
Reducer 3 INITED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 01/03 [=====================>>-----] 81% ELAPSED TIME: 31.84 s
--------------------------------------------------------------------------------
Status: Failed
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_3_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_3_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_3_00
Vertex re-running, vertexName=Map 1, vertexId=vertex_1496156002153_0008_3_00
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_3_01
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_3_01
Vertex re-running, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_3_01
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1496156002153_0008_3_01, diagnostics=[Task failed, taskId=task_1496156002153_0008_3_01_000000, diagnostics=[TaskAttempt 0 failed, info=[attempt_1496156002153_0008_3_01_000000_0 being failed for too many output errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 1 failed, info=[attempt_1496156002153_0008_3_01_000000_1 being failed for too many output errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 2 failed, info=[attempt_1496156002153_0008_3_01_000000_2 being failed for too many output errors. failureFraction=1.0, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 3 failed, info=[Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_1} #5
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:382)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:693)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1567)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:253)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:356)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:264)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_1} #5
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:382)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
...
... View more
05-30-2017
06:53 PM
Trying to run query: select
profile_id,count(*) cnt from mydb group by profile_id order by cnt
desc limit 100; Getting Vertex_Failure due to null pointer. /etc/hosts looks fine. I can connect from one node to the other on various ports. They have a lot of RAM, and Disk available at this point. I have increased a bunch of buffers in Hive and some other settings from searching the Internet. Still no resolution. I tried query with TEZ disabled and ran into errors. 2017-05-30 14:27:32,270 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1496156002153_0004_2][Event:TASK_ATTEMPT_FINISHED]: vertexName=Reducer 2, taskAttemptId=attempt_1496156002153_0004_2_01_000000_1, creationTime=1496168846938, allocationTime=1496168846940, startTime=1496168847079, finishTime=1496168852270, timeTaken=5191, status=FAILED, errorEnum=INPUT_READ_ERROR, diagnostics=Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_1} #14
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:382)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:693)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1567)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:253)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:356)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:264)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_1} #14
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:382)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:693)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1567)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:253)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:356)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:264)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
, nodeHttpAddress=datalake.node.com:8042, counters=Counters: 60, File System Counters, FILE_BYTES_READ=1978141, org.apache.tez.common.counters.TaskCounter, REDUCE_INPUT_GROUPS=0, REDUCE_INPUT_RECORDS=0, COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=18, NUM_SKIPPED_INPUTS=0, NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=18, GC_TIME_MILLIS=0, CPU_MILLISECONDS=750, PHYSICAL_MEMORY_BYTES=8445231104, VIRTUAL_MEMORY_BYTES=11002957824, COMMITTED_HEAP_BYTES=8445231104, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=72, SHUFFLE_BYTES=1594950, SHUFFLE_BYTES_DECOMPRESSED=3398576, SHUFFLE_BYTES_TO_MEM=0, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_DISK_DIRECT=1594950, NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, SHUFFLE_PHASE_TIME=23, MERGE_PHASE_TIME=5060, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=17, Shuffle Errors, BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0, Shuffle Errors_Reducer_2_INPUT_Map_1, BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0, TaskCounter_Reducer_2_INPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=72, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, COMBINE_INPUT_RECORDS=0, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=17, MERGED_MAP_OUTPUTS=18, MERGE_PHASE_TIME=5060, NUM_DISK_TO_DISK_MERGES=0, NUM_FAILED_SHUFFLE_INPUTS=0, NUM_MEM_TO_DISK_MERGES=0, NUM_SHUFFLED_INPUTS=18, NUM_SKIPPED_INPUTS=0, REDUCE_INPUT_GROUPS=0, REDUCE_INPUT_RECORDS=0, SHUFFLE_BYTES=1594950, SHUFFLE_BYTES_DECOMPRESSED=3398576, SHUFFLE_BYTES_DISK_DIRECT=1594950, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=0, SHUFFLE_PHASE_TIME=23, SPILLED_RECORDS=0
2017-05-30 14:27:32,271 [INFO] [Dispatcher thread {Central}] |impl.TaskImpl|: Scheduling new attempt for task: task_1496156002153_0004_2_01_000000, currentFailedAttempts: 2, maxFailedAttempts: 4
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez