Support Questions

Find answers, ask questions, and share your expertise

Hive query on prem writing to S3 fails bc of return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

avatar
Guru

Issue

Issue I am having is this here, but setting the two configs are not working for me (seems it works for some and not others) https://forums.aws.amazon.com/message.jspa?messageID=768332

Below is full description

Goal:

I am writing this test query with small data size to output results to S3.

INSERT OVERWRITE DIRECTORY 's3a://demo/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
select * from demo_table;

Notes:

  • This query runs when outputted to HDFS directory.
  • I can create an external table locally against s3 remotely.. so my configurations are working (CREATE EXTERNAL TABLE ... LOCATION 's3a://demo/'; works)
  • Only when outputing a query to s3 do I get a failure (below)

Error:

The error when attempting the query to output to S3 is:

2018-02-12 01:12:58,790 INFO  [HiveServer2-Background-Pool: Thread-363]: log.PerfLogger (PerfLogger.java:PerfLogEnd(177)) - </PERFLOG method=releaseLocks start=1518397978790 end=1518397978790 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2018-02-12 01:12:58,791 ERROR [HiveServer2-Background-Pool: Thread-363]: operation.Operation (SQLOperation.java:run(258)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:199)
	at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
	at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2018-02-12 01:12:58,794 INFO  [HiveServer2-Handler-Pool: Thread-106]: session.HiveSessionImpl (HiveSessionImpl.java:acquireAfterOpLock(342)) - We are setting the hadoop caller context to 5e6f48a9-7014-4d15-b02c-579557b5fb98 for thread HiveServer2-Handler-Pool: Thread-106

Additional note:

The query writes the tmp files to 's3a://demo/' but then fails with the above error. Tmp files look like

[hdfs@gkeys0 centos]$ hdfs dfs -ls -R s3a://demo/
drwxrwxrwx   - hdfs hdfs          0 2018-02-12 02:12 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1
drwxrwxrwx   - hdfs hdfs          0 2018-02-12 02:12 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000
-rw-rw-rw-   1 hdfs hdfs      38106 2018-02-12 02:09 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000/000000_0
-rw-rw-rw-   1 hdfs hdfs       6570 2018-02-12 02:09 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000/000001_0

Am I missing a config to set, or something like that?

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

Greg,

See if you can write to folder inside the bucket rather than directly writing into root level bucket.

INSERT OVERWRITE DIRECTORY 's3a://demo/testdata' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE select*from demo_table;

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

Greg,

See if you can write to folder inside the bucket rather than directly writing into root level bucket.

INSERT OVERWRITE DIRECTORY 's3a://demo/testdata' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE select*from demo_table;

avatar
Guru

Thanks @Sreekanth Munigati .. that worked!

s3a://demo/ does not work

s3a://demo/folder does!

avatar

let's just say there's "ambiguity" about how root directories are treated in object stores and filesystems, and rename() is a key troublespot everywhere. It's known there are quirks here, but as normal s3/wasb/adl useage goes to subdirectories, nobody has ever sat down with HDFS to argue the subtleties of renaming something into the root directory