Support Questions
Find answers, ask questions, and share your expertise

I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

Super Collaborator

15/12/30 08:55:10 INFO mapreduce.Job: Task Id : attempt_1451465507406_0001_m_000001_2, Status : FAILED Error: java.lang.IllegalArgumentException at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1307) at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1230) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:274) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.tools.mapred.CopyMapper.setup(CopyMapper.java:112) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor. This is the relevant source code from the S3AFileSystem class:

 int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue<Runnable> workQueue =
      new LinkedBlockingQueue<>(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);

The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file. The defaults for these are defined in core-default.xml:

<property>
  <name>fs.s3a.threads.max</name>
  <value>256</value>
  <description> Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.core</name>
  <value>15</value>
  <description>Number of core threads in the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.keepalivetime</name>
  <value>60</value>
  <description>Number of seconds a thread can be idle before being
    terminated.</description>
</property>
<property>
  <name>fs.s3a.max.total.tasks</name>
  <value>1000</value>
  <description>Number of (part)uploads allowed to the queue before
    blocking additional uploads.</description>
</property>

Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?

View solution in original post

3 REPLIES 3

Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

Mentor

can you provide the command you executed? You can follow advice in this thread https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist....

Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor. This is the relevant source code from the S3AFileSystem class:

 int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue<Runnable> workQueue =
      new LinkedBlockingQueue<>(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);

The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file. The defaults for these are defined in core-default.xml:

<property>
  <name>fs.s3a.threads.max</name>
  <value>256</value>
  <description> Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.core</name>
  <value>15</value>
  <description>Number of core threads in the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.keepalivetime</name>
  <value>60</value>
  <description>Number of seconds a thread can be idle before being
    terminated.</description>
</property>
<property>
  <name>fs.s3a.max.total.tasks</name>
  <value>1000</value>
  <description>Number of (part)uploads allowed to the queue before
    blocking additional uploads.</description>
</property>

Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?

View solution in original post

Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

Super Collaborator

yes it's working now. i have given fs.s3a.max.total.tasks value is 10 that's why it was throwing an exception

If possible , could you please reply the below query ?.

I have total 6 T.B (3*2 T.B) hard drives in each node. HDFS is using 5 T.B

i need to upload 5 T.B data into s3 Bucket.

i am using s3a client and i am getting "No Space Left in Device" ... !