Support Questions

Find answers, ask questions, and share your expertise

I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.

avatar
Super Collaborator

15/12/30 08:55:10 INFO mapreduce.Job: Task Id : attempt_1451465507406_0001_m_000001_2, Status : FAILED Error: java.lang.IllegalArgumentException at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1307) at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1230) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:274) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.tools.mapred.CopyMapper.setup(CopyMapper.java:112) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

1 ACCEPTED SOLUTION

avatar

According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor. This is the relevant source code from the S3AFileSystem class:

 int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue<Runnable> workQueue =
      new LinkedBlockingQueue<>(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);

The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file. The defaults for these are defined in core-default.xml:

<property>
  <name>fs.s3a.threads.max</name>
  <value>256</value>
  <description> Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.core</name>
  <value>15</value>
  <description>Number of core threads in the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.keepalivetime</name>
  <value>60</value>
  <description>Number of seconds a thread can be idle before being
    terminated.</description>
</property>
<property>
  <name>fs.s3a.max.total.tasks</name>
  <value>1000</value>
  <description>Number of (part)uploads allowed to the queue before
    blocking additional uploads.</description>
</property>

Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?

View solution in original post

3 REPLIES 3

avatar
Master Mentor

can you provide the command you executed? You can follow advice in this thread https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist....

avatar

According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor. This is the relevant source code from the S3AFileSystem class:

 int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue<Runnable> workQueue =
      new LinkedBlockingQueue<>(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);

The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file. The defaults for these are defined in core-default.xml:

<property>
  <name>fs.s3a.threads.max</name>
  <value>256</value>
  <description> Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.core</name>
  <value>15</value>
  <description>Number of core threads in the threadpool.</description>
</property>
<property>
  <name>fs.s3a.threads.keepalivetime</name>
  <value>60</value>
  <description>Number of seconds a thread can be idle before being
    terminated.</description>
</property>
<property>
  <name>fs.s3a.max.total.tasks</name>
  <value>1000</value>
  <description>Number of (part)uploads allowed to the queue before
    blocking additional uploads.</description>
</property>

Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?

avatar
Super Collaborator

yes it's working now. i have given fs.s3a.max.total.tasks value is 10 that's why it was throwing an exception

If possible , could you please reply the below query ?.

I have total 6 T.B (3*2 T.B) hard drives in each node. HDFS is using 5 T.B

i need to upload 5 T.B data into s3 Bucket.

i am using s3a client and i am getting "No Space Left in Device" ... !