In short I'm trying to create a teststack for spark - aim being to read a file from an s3 bucket and then write it to another. Windows env.
I was repeatedly encountering errors when trying to access S3 or S3n as a ClassNotFoundException was being thrown. These classes were added to the core-site.xml as the s3 and s3n.impl
I added the hadoop/share/tools/lib to the classpath to no avail, I then added the `aws-java-jdk` and `hadoop-aws` jars to the share/hadoop/common folder and I am now able to list the contents of a bucket using haddop on the command line.
`hadoop fs -ls "s3n://bucket"` shows me the contents, this is great news 🙂 In my mind the hadoop configuration should be picked up by spark so solving one should solve the other however when I run spark-shell and try to save a file to s3 I get the usual ClassNotFoundException as shown below. I'm still quite new to this and unsure if I've missed something obvious, hopefully someone can help me solve the riddle? Any help is greatly appreciated, thanks. The exception:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) my core-site.xml(which I believe to be correct now as hadoop can access s3): <property> <name>fs.s3.impl</name> <value>org.apache.hadoop.fs.s3.S3FileSystem</value> </property> <property>
<description>The FileSystem for s3n: (Native S3) uris.</description>
</property> and finally the hadoop-env.cmd showing the classpath(which is seemingly ignored): set HADOOP_CONF_DIR=C:\Spark\hadoop\etc\hadoop
@rem ##added as s3 filesystem not found.http://stackoverflow.com/questions/28029134/how-can-i-access-s3-s3n-from-a-local-hadoop-2-6-installation
@rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
if exist %HADOOP_HOME%\contrib\capacity-scheduler (
if not defined HADOOP_CLASSPATH (
) else (
... View more