Support Questions
Find answers, ask questions, and share your expertise

Loading to S3 Fails - CDH 5.3.0

New Contributor

Since upgrading our cluster from 5.1.2 to 5.3.0, we have been unable to load data to a Hive table that points to S3. It fails with the following error:

 

Loading data to table schema.table_name partition (dt=null)
Failed with exception Wrong FS: s3n://<s3_bucket>/converted_installs/.hive-staging_hive_2015-01-26_11-05-32_849_2677145287515034575-1/-ext-10000/dt=2015-01-25/000000_0.gz, expected: hdfs://<name_node>:8020
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

The table itself was created using the following DDL (I removed the columns, since they are not very important):

 

...
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
STORED AS TEXTFILE
LOCATION 's3n://<s3_bucket>/data/warehouse_v1/converted_installs';

We don't have any issues writing to tables that reside on HDFS locally, but for some reason, writing to S3 fails. Anyone have an idea how to fix this?

27 REPLIES 27

Cloudera Employee

CDH5.4.5 is release on Aug 18, it should have the s3 fix. 

Just verified it's working!

 

Thanks!

 

-B

New Contributor

i upgraded to 5.4.5 but am still facing the problem.  Any config changes we need to do ?

 

 

15/08/20 14:14:42 INFO BlockManagerMasterActor: Registering block manager ip-10-224-15-31.aws.chotel.com:34645 with 530.0 MB RAM, BlockManagerId(1, ip-10-224-15-31.aws.chotel.com, 34645)

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3a://spark-poc-1/in/paceCYShell.parquet, expected: hdfs://ip-10-224-15-26.aws.chotel.com:8020

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)

at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)

at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)

at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)

 

 

 

It's working from the hive cli. I'm not sure about from spark.

Explorer

Even I am facing the same issue, tried running the query from both beeline and hive cli.

Hive version -- /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/hive-common-1.1.0-cdh5.4.5.jar!/hive-log4j.properties

Beeline version -- Beeline version 1.1.0-cdh5.4.5 by Apache Hive

 

 

CREATE TABLE IF NOT EXISTS <s3_table>

STORED AS AVRO

LOCATION 's3a://***/***/***/***/'

AS

select * from <hdfs_table>;

 

--------------------

HIVE LOGS:

Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1442849170635_4506, Tracking URL = http://********************/
Kill Command = /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job -kill job_1442849170635_4506
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-09-29 22:16:53,027 Stage-1 map = 0%, reduce = 0%
2015-09-29 22:17:12,722 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.28 sec
MapReduce Total cumulative CPU time: 3 seconds 280 msec
Ended Job = job_***************
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://**************/hive_hive_2015-09-29_22-16-37_095_1255538331105842480-1/-ext-10001
Moving data to: s3a://******************
Failed with exception Wrong FS: s3a://***********, expected: hdfs://*************
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.28 sec HDFS Read: 2455930 HDFS Write: 2746940 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 280 msec

-------------------------------------------

BEELINE LOGS

 

INFO : Moving data to: s3a://********** from hdfs://**********/hive_hive_2015-09-29_22-28-16_959_3663235137172518120-507/-ext-10001
ERROR : Failed with exception Wrong FS: s3a://************, expected: hdfs://*************
java.lang.IllegalArgumentException: Wrong FS: s3a://*********, expected: hdfs://************
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:724)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2471)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1398)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1182)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1048)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)

---------------------------------------------

 

Any help is appreciated !!!

 

Contributor

Yea I think there's multiple issues, sorry for the inconvenience.  This one is for moving between tables on different file-system (s3 and non-s3).

 

This one is HIVE-7476, fixed in CDH5.5.

Explorer

Thank you for the reply.

FYI:

Instead of CTAS, if I create an external/managed table on S3 and do insert from HDFS to S3 it is working. 

Contributor

Nice, thanks for update.