Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

INSERT OVERWRITE:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

INSERT OVERWRITE:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

New Contributor

pyspark script is an insert overwrite in AWS EMR. It works good outside AWS. Also if I just did a select statement and leave out the insert overwrite part, it ran successfully. 

Spoiler

Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1571059195492_0049_1_00, diagnostics=[Vertex vertex_1571059195492_0049_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: wo initializer failed, vertex=vertex_1571059195492_0049_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://cci-edo-data-source/raw/edw/wo_cmpltd_tc_sr_dly_fact/wo_completion_dt_key=2019-10-13
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-11
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-14
Input path does not exist: s3://xxxx/raw/edw/xxxxxx/dt_key=2019-10-12
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:260)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:208)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

1 REPLY 1

Re: INSERT OVERWRITE:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

Community Manager

@datana, your question would be much more likely to elicit a useful response if you posted it on one of the many community forums AWS maintains for discussing EMR.

 

 

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Don't have an account?
Coming from Hortonworks? Activate your account here