Created on
10-14-2019
02:10 PM
- last edited on
10-14-2019
04:16 PM
by
ask_bill_brooks
pyspark script is an insert overwrite in AWS EMR. It works good outside AWS. Also if I just did a select statement and leave out the insert overwrite part, it ran successfully.
Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Created on 10-14-2019 04:22 PM - edited 10-14-2019 04:22 PM
@datana, your question would be much more likely to elicit a useful response if you posted it on one of the many community forums AWS maintains for discussing EMR.