Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

unable to write hive query output to s3

Solved Go to solution
Highlighted

unable to write hive query output to s3

I am on HDP 2.5 and when trying to write hive query output to S3, I get below exception.



Caused by: java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:342)

	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:332)

	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2761)

	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)

	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)

	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)

	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)

	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)

	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:348)

	at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.initializeOp(VectorFileSinkOperator.java:70)

	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363)

	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)

	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)

	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)

	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)

	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)

	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)

	at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:489)

	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:231)

	... 15 more

Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException

	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

	... 34 more

Below is what I did from hive shell

INSERT OVERWRITE DIRECTORY 's3n://santhosh.aws.com/tmp'
SELECT * FROM REGION

The jets3t library is part of the hive classpath ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: unable to write hive query output to s3

Contributor

S3N is really old and pretty much deprecated. Can you change your URL to "s3a://santhosh.aws.com/tmp" and ensure that you have "fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem". If you do not have InstanceProfileCredentialProvider, you have to configure "fs.s3a.access.key and fs.s3a.secret.key".

11 REPLIES 11

Re: unable to write hive query output to s3

what version of jet2st library is on your classpath? Jet3st 0.9.0 has introduced ServiceException, if you have older library then you need to upgrade lib.

Re: unable to write hive query output to s3

@Rajkumar Singh

Thank you for your reply. Should HDP not take care of packaging this correctly ? This issue I see it in HDP 2.5

Re: unable to write hive query output to s3

@Santhosh B Gowda

I can see hive is picking right jar from these locations, are you seeing different jar version on classpath?



java    25940 hive  mem    REG              252,1   539735  1180054 /usr/hdp/2.5.0.0-1133/hadoop-mapreduce/jets3t-0.9.0.jar

java    25940 hive  mem    REG              252,1   539735  1179933 /usr/hdp/2.5.0.0-1133/hadoop-yarn/lib/jets3t-0.9.0.jar

java    25940 hive  mem    REG              252,1   539735  1053479 /usr/hdp/2.5.0.0-1133/hadoop/lib/jets3t-0.9.0.jar

java    25940 hive  183r   REG              252,1   539735  1053479 /usr/hdp/2.5.0.0-1133/hadoop/lib/jets3t-0.9.0.jar

java    25940 hive  297r   REG              252,1   539735  1179933 /usr/hdp/2.5.0.0-1133/hadoop-yarn/lib/jets3t-0.9.0.jar

java    25940 hive  415r   REG              252,1   539735  1180054 /usr/hdp/2.5.0.0-1133/hadoop-mapreduce/jets3t-0.9.0.jar

Re: unable to write hive query output to s3

@Rajkumar Singh

I can see the jar's in specified location, how did we check whether is loading these jars ?

ls -lrt /usr/hdp/2.5.3.0-14/hadoop/lib/jets3t-0.9.0.jar
-rw-r--r--. 1 root root 539735 Nov 10 18:00 /usr/hdp/2.5.3.0-14/hadoop/lib/jets3t-0.9.0.jar

Re: unable to write hive query output to s3

@Santhosh B Gowda

if you are using hive-cli/hiveserver2 then get the process id and check

lsof -p <pid> | grep jets3t

it will tell you what jets3t jar available on the classpath

Re: unable to write hive query output to s3

@Rajkumar Singh Thanks. I could see that jets3t-0.9.0.jar is loaded.

Also as per @Rajesh Balamohan suggestion moving from s3n to s3a , I could get it working.

Re: unable to write hive query output to s3

@Santhosh B Gowda

Was this a fresh install or an upgrade from an older version of HDP? If this was an upgrade, this thread may be useful:

http://stackoverflow.com/questions/33852044/why-can-i-not-read-from-the-aws-s3-in-spark-application-...

As I see in your last post, you mention a path /usr/hdp/2.5.3.0-14/hadoop/lib/jets3t-0.9.0.jar, could you also run the following and post the result?

ls -lrt /usr/hdp/

Re: unable to write hive query output to s3

See link below to learn why s3a is a better option than s3n, but that may not be the cause for your issue.

https://wiki.apache.org/hadoop/AmazonS3

Re: unable to write hive query output to s3

@Constantin Stanca I see this issue with both fresh and upgraded system and moving from s3n and s3a help me in uploading to S3.