Reply
Highlighted
New Contributor
Posts: 4
Registered: ‎01-26-2015

Loading to S3 Fails - CDH 5.3.0

Since upgrading our cluster from 5.1.2 to 5.3.0, we have been unable to load data to a Hive table that points to S3. It fails with the following error:

 

Loading data to table schema.table_name partition (dt=null)
Failed with exception Wrong FS: s3n://<s3_bucket>/converted_installs/.hive-staging_hive_2015-01-26_11-05-32_849_2677145287515034575-1/-ext-10000/dt=2015-01-25/000000_0.gz, expected: hdfs://<name_node>:8020
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

The table itself was created using the following DDL (I removed the columns, since they are not very important):

 

...
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
STORED AS TEXTFILE
LOCATION 's3n://<s3_bucket>/data/warehouse_v1/converted_installs';

We don't have any issues writing to tables that reside on HDFS locally, but for some reason, writing to S3 fails. Anyone have an idea how to fix this?

Cloudera Employee
Posts: 30
Registered: ‎12-09-2014

Re: Loading to S3 Fails - CDH 5.3.0

Can you please check HS2 and paste any relevant exception here?  Thanks.  

 

Also wondering if this worked before in 5.1.2?

New Contributor
Posts: 4
Registered: ‎01-26-2015

Re: Loading to S3 Fails - CDH 5.3.0

I'm afraid I do not see anything in the HiveServer2 logs related to this problem.

 

Loading to S3 worked fine in CDH4.

New Contributor
Posts: 4
Registered: ‎01-26-2015

Re: Loading to S3 Fails - CDH 5.3.0

We tried loading data to S3 on a second cluster and encountered the same problem. Here's the HiveServer2 log output:

 

015-01-30 18:04:15,497 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Moving tmp dir: s3n://<s3_bucket>/test_table/.hive-staging_hive_2015-01-30_18-03-09_871_2770339221568578012-1/_tmp.-ext-10000 to: s3n://<s3_bucket/test_table/.hive-staging_hive_2015-01-30_18-03-09_871_2770339221568578012-1/-ext-10000
2015-01-30 18:04:20,350 INFO org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=task.MOVE.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 18:04:20,359 INFO org.apache.hadoop.hive.ql.exec.Task: Loading data to table test.test_table from s3n://<s3_bucket>/test_table/.hive-staging_hive_2015-01-30_18-03-09_871_2770339221568578012-1/-ext-10000
2015-01-30 18:04:20,564 INFO org.apache.hive.service.cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0ad4737c-4d4a-4585-86c1-5717fc72fd40]: getLog()
2015-01-30 18:04:24,727 ERROR org.apache.hadoop.hive.ql.exec.Task: Failed with exception Wrong FS: s3n://<s3_bucket>/test_table/.hive-staging_hive_2015-01-30_18-03-09_871_2770339221568578012-1/-ext-10000/000000_0, expected: hdfs://vm-cluster-node1:8020
java.lang.IllegalArgumentException: Wrong FS: s3n://<s3_bucket>/test_table/.hive-staging_hive_2015-01-30_18-03-09_871_2770339221568578012-1/-ext-10000/000000_0, expected: hdfs://vm-cluster-node1:8020
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:192)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1877)
	at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
	at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:961)
	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2280)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2356)
	at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1493)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:284)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
	at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

2015-01-30 18:04:24,727 ERROR org.apache.hadoop.hive.ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
2015-01-30 18:04:24,728 INFO org.apache.hadoop.hive.ql.log.PerfLogger: </PERFLOG method=Driver.execute start=1422640997058 end=1422641064728 duration=67670 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 18:04:24,728 INFO org.apache.hadoop.hive.ql.Driver: MapReduce Jobs Launched: 
2015-01-30 18:04:24,728 INFO org.apache.hadoop.hive.ql.Driver: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.29 sec   HDFS Read: 5598 HDFS Write: 0 SUCCESS
2015-01-30 18:04:24,730 INFO org.apache.hadoop.hive.ql.Driver: Total MapReduce CPU Time Spent: 5 seconds 290 msec
2015-01-30 18:04:24,730 INFO org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 18:04:24,730 INFO ZooKeeperHiveLockManager:  about to release lock for test/test_table
2015-01-30 18:04:24,738 INFO ZooKeeperHiveLockManager:  about to release lock for test
2015-01-30 18:04:24,744 INFO ZooKeeperHiveLockManager:  about to release lock for default/clicks
2015-01-30 18:04:24,749 INFO ZooKeeperHiveLockManager:  about to release lock for default
2015-01-30 18:04:24,753 INFO org.apache.hadoop.hive.ql.log.PerfLogger: </PERFLOG method=releaseLocks start=1422641064730 end=1422641064753 duration=23 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 18:04:24,756 ERROR org.apache.hive.service.cli.operation.Operation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147)
	at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2015-01-30 18:04:24,760 INFO org.apache.hive.service.cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0ad4737c-4d4a-4585-86c1-5717fc72fd40]: getLog()

 Is this a bug in CDH 5.3.0 or did we set up our clusters incorrectly?

New Contributor
Posts: 1
Registered: ‎01-31-2015

Re: Loading to S3 Fails - CDH 5.3.0

Hello,

I am experiencing a similar issue with the Google Cloud Storage connector for hadoop + Hive on CDH 5.3

It appears as though hive expects to be able to write to only the local hdfs, even though it is able to read/write from the remote fs. I have the same error when reading from / writing to a gs://<bucket> location.

 

Not sure if this is a Hive bug or a configuration issue. 

 

Any progress on determining if it is a bug with Hive/S3 on CDH 5.3?

New Contributor
Posts: 4
Registered: ‎01-26-2015

Re: Loading to S3 Fails - CDH 5.3.0

We were unable to get Hive tables pointing to S3 to work in CDH 5.3.0, so we downgraded to CDH 5.2.0, which works fine.

Explorer
Posts: 6
Registered: ‎03-16-2015

Re: Loading to S3 Fails - CDH 5.3.0

 I experienced this problem, which has proven disasterous for a number of workflows which used S3 locations for external tables in hive. I tried a number of configuration changes as well as manually uploading freshly compiled hadoop-s3a binaries to the cluster as a workaround.

 

Theres no documents or info on how to use the S3A filesystem bundled inside of 5.3.0, it just says it supports the new S3A filesystem. Quick tests proved that untrue, Im guessing I may have to move items into the classpath for it to work, but with zero reference documentation on how to get this feature to work, I manually copied the hadoop-s3a project binaries to the path. 

 

The stacktrace I get happens inside of hiveserver2 during the movefiles/copyfiles step, the same as above. This is either a regression or a new bug that stops other filesystems for external tables, which basically defeats the purpose of having external tables pretty much.

 

Is there any solution in 5.3.2 or should I follow the example of the user above and basically .... nuke my cluster and install an old version. It will be a lot of wasted time.

Explorer
Posts: 6
Registered: ‎03-16-2015

Re: Loading to S3 Fails - CDH 5.3.0

Heres the relevant stacktrace:

 

15/03/13 18:08:15 INFO log.PerfLogger: <PERFLOG method=task.MOVE.Stage-4 from=org.apache.hadoop.hive.ql.Driver>
15/03/13 18:08:15 INFO exec.Task: Moving data to: s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10000 from s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10002
15/03/13 18:08:15 INFO s3a.S3AFileSystem: Getting path status for s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10002 (tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10002)
15/03/13 18:08:16 INFO s3a.S3AFileSystem: Getting path status for s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1 (tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1)
15/03/13 18:08:16 INFO s3a.S3AFileSystem: Delete path s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10000 - recursive true
15/03/13 18:08:16 INFO s3a.S3AFileSystem: Getting path status for s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10000 (tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10000)
15/03/13 18:08:16 ERROR exec.Task: Failed with exception Wrong FS: s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10002, expected: hdfs://smq-cloudera-04.dpcloud.local:8020
java.lang.IllegalArgumentException: Wrong FS: s3a://datapipe-usage/tmp/hive-staging_hive_2015-03-13_18-07-13_821_8375490564348952212-1/-ext-10002, expected: hdfs://smq-cloudera-04.dpcloud.local:8020
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:192)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1877)
	at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
	at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:961)
	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2280)
	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:92)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:209)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
	at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

 

New Contributor
Posts: 2
Registered: ‎07-15-2015

Re: Loading to S3 Fails - CDH 5.3.0

This thread sort of trailed off. Has there been any resolution for this. We are experiencing this exact issue in CDH 5.4.3.

Explorer
Posts: 6
Registered: ‎03-16-2015

Re: Loading to S3 Fails - CDH 5.3.0

[ Edited ]

I reverted to 5.2.4 for a period, which didn't have this same issue. Unfortunately, I had to bring the cluster to 5.4.2 for other reasons, and I implemented a workaround using a staging hdfs location and distcp.

 

The issue remains. Reading the stack trace and looking around online I'm pretty sure it has to do with the new HDFS encryption support.

Announcements