Created on 12-10-2014 06:15 PM - edited 09-16-2022 02:14 AM
I downloaded the s3distcp jar from s3://elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar, and run it as the following:
hadoop jar ~/s3distcp.jar --dest hdfs://resource-manager.localhost:8020/user/ec2-user/2015/ --src s3n://test/2015/
But it failed, the error message was:
Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/services/s3/AmazonS3Client
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.createAmazonS3Client(S3DistCp.java:456)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.createInputFileListS3(S3DistCp.java:405)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.createInputFileList(S3DistCp.java:380)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:640)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:523)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I had already copied aws-java-sdk-s3-1.9.7.jar to /opt/cloudera/parcels/CDH/jars on every nodes, and added "/opt/cloudera/parcels/CDH/jars/*" to mapreduce.application.classpath via the cloudera manager web console and restarted the cluster.
Created 12-18-2014 04:44 PM
I have resolved this, download aws-java-sdk from http://sdk-for-java.amazonwebservices.com/latest/aws-java-sdk.zip, unzip it, and copy every jar in aws-java-sdk/lib/ and aws-java-sdk/third-party/ to your datanodes' /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/hadoop
Created 12-10-2014 06:16 PM
I used this method to resolve NoClassDefFoundError problems before and it worked everytime, but this time, it didn't 😞
Created 12-18-2014 04:44 PM
I have resolved this, download aws-java-sdk from http://sdk-for-java.amazonwebservices.com/latest/aws-java-sdk.zip, unzip it, and copy every jar in aws-java-sdk/lib/ and aws-java-sdk/third-party/ to your datanodes' /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/hadoop
Created 12-18-2014 05:34 PM
Besides, there is a dirty work you need to do, since CDH 5.2 dosen't have the class
org/apache/hadoop/fs/s3native/ProgressableResettableBufferedFileInputStream
You need to extract this class from Amazon EMR and repackage it. Or you can compile one from its source code here: