Support Questions

Find answers, ask questions, and share your expertise

INSERT OVERWRITE:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

New Contributor

pyspark script is an insert overwrite in AWS EMR. It works good outside AWS. Also if I just did a select statement and leave out the insert overwrite part, it ran successfully. 

Spoiler

Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

1 REPLY 1

@datana, your question would be much more likely to elicit a useful response if you posted it on one of the many community forums AWS maintains for discussing EMR.

 

 

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.