Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Hive query on prem writing to S3 fails bc of return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Guru

Issue

Issue I am having is this here, but setting the two configs are not working for me (seems it works for some and not others) https://forums.aws.amazon.com/message.jspa?messageID=768332

Below is full description

Goal:

I am writing this test query with small data size to output results to S3.

INSERT OVERWRITE DIRECTORY 's3a://demo/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
select * from demo_table;

Notes:

  • This query runs when outputted to HDFS directory.
  • I can create an external table locally against s3 remotely.. so my configurations are working (CREATE EXTERNAL TABLE ... LOCATION 's3a://demo/'; works)
  • Only when outputing a query to s3 do I get a failure (below)

Error:

The error when attempting the query to output to S3 is:

2018-02-12 01:12:58,790 INFO  [HiveServer2-Background-Pool: Thread-363]: log.PerfLogger (PerfLogger.java:PerfLogEnd(177)) - </PERFLOG method=releaseLocks start=1518397978790 end=1518397978790 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2018-02-12 01:12:58,791 ERROR [HiveServer2-Background-Pool: Thread-363]: operation.Operation (SQLOperation.java:run(258)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:199)
	at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
	at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2018-02-12 01:12:58,794 INFO  [HiveServer2-Handler-Pool: Thread-106]: session.HiveSessionImpl (HiveSessionImpl.java:acquireAfterOpLock(342)) - We are setting the hadoop caller context to 5e6f48a9-7014-4d15-b02c-579557b5fb98 for thread HiveServer2-Handler-Pool: Thread-106

Additional note:

The query writes the tmp files to 's3a://demo/' but then fails with the above error. Tmp files look like

[hdfs@gkeys0 centos]$ hdfs dfs -ls -R s3a://demo/
drwxrwxrwx   - hdfs hdfs          0 2018-02-12 02:12 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1
drwxrwxrwx   - hdfs hdfs          0 2018-02-12 02:12 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000
-rw-rw-rw-   1 hdfs hdfs      38106 2018-02-12 02:09 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000/000000_0
-rw-rw-rw-   1 hdfs hdfs       6570 2018-02-12 02:09 s3a://demo/.hive-staging_hive_2018-02-12_02-08-27_090_2945283769634970656-1/-ext-10000/000001_0

Am I missing a config to set, or something like that?

1 ACCEPTED SOLUTION

Cloudera Employee

Greg,

See if you can write to folder inside the bucket rather than directly writing into root level bucket.

INSERT OVERWRITE DIRECTORY 's3a://demo/testdata' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE select*from demo_table;

View solution in original post

3 REPLIES 3

Cloudera Employee

Greg,

See if you can write to folder inside the bucket rather than directly writing into root level bucket.

INSERT OVERWRITE DIRECTORY 's3a://demo/testdata' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE select*from demo_table;

Guru

Thanks @Sreekanth Munigati .. that worked!

s3a://demo/ does not work

s3a://demo/folder does!

let's just say there's "ambiguity" about how root directories are treated in object stores and filesystems, and rename() is a key troublespot everywhere. It's known there are quirks here, but as normal s3/wasb/adl useage goes to subdirectories, nobody has ever sat down with HDFS to argue the subtleties of renaming something into the root directory