<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101026#M64011</link>
    <description>&lt;P&gt;According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor.  This is the relevant source code from the &lt;A href="https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L262-L281"&gt;S3AFileSystem&lt;/A&gt; class:&lt;/P&gt;&lt;PRE&gt; int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue&amp;lt;Runnable&amp;gt; workQueue =
      new LinkedBlockingQueue&amp;lt;&amp;gt;(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);&lt;/PRE&gt;&lt;P&gt;The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file.  The defaults for these are defined in &lt;A href="https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L780-L805"&gt;core-default.xml&lt;/A&gt;:&lt;/P&gt;&lt;PRE&gt;&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.max&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;256&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt; Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.core&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;15&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of core threads in the threadpool.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.keepalivetime&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;60&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of seconds a thread can be idle before being
    terminated.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.max.total.tasks&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;1000&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of (part)uploads allowed to the queue before
    blocking additional uploads.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;&lt;/PRE&gt;&lt;P&gt;
Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?&lt;/P&gt;</description>
    <pubDate>Thu, 31 Dec 2015 02:23:29 GMT</pubDate>
    <dc:creator>cnauroth</dc:creator>
    <dc:date>2015-12-31T02:23:29Z</dc:date>
    <item>
      <title>I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101024#M64009</link>
      <description>&lt;P&gt;15/12/30 08:55:10 INFO mapreduce.Job: Task Id : attempt_1451465507406_0001_m_000001_2, Status : FAILED
Error: java.lang.IllegalArgumentException
        at java.util.concurrent.ThreadPoolExecutor.&amp;lt;init&amp;gt;(ThreadPoolExecutor.java:1307)
        at java.util.concurrent.ThreadPoolExecutor.&amp;lt;init&amp;gt;(ThreadPoolExecutor.java:1230)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:274)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.tools.mapred.CopyMapper.setup(CopyMapper.java:112)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)&lt;/P&gt;</description>
      <pubDate>Wed, 30 Dec 2015 17:07:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101024#M64009</guid>
      <dc:creator>subhash_parise3</dc:creator>
      <dc:date>2015-12-30T17:07:03Z</dc:date>
    </item>
    <item>
      <title>Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101025#M64010</link>
      <description>&lt;P&gt;can you provide the command you executed? You can follow advice in this thread &lt;A href="https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist.html" target="_blank"&gt;https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Dec 2015 21:55:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101025#M64010</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-12-30T21:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101026#M64011</link>
      <description>&lt;P&gt;According to the stack trace, there was an IllegalArgumentException while trying to create a ThreadPoolExecutor.  This is the relevant source code from the &lt;A href="https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L262-L281"&gt;S3AFileSystem&lt;/A&gt; class:&lt;/P&gt;&lt;PRE&gt; int maxThreads = conf.getInt(MAX_THREADS, DEFAULT_MAX_THREADS); int coreThreads = conf.getInt(CORE_THREADS, DEFAULT_CORE_THREADS);
    if (maxThreads == 0) {
      maxThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    if (coreThreads == 0) {
      coreThreads = Runtime.getRuntime().availableProcessors() * 8;
    }
    long keepAliveTime = conf.getLong(KEEPALIVE_TIME, DEFAULT_KEEPALIVE_TIME);
    LinkedBlockingQueue&amp;lt;Runnable&amp;gt; workQueue =
      new LinkedBlockingQueue&amp;lt;&amp;gt;(maxThreads *
        conf.getInt(MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS));
    threadPoolExecutor = new ThreadPoolExecutor(
        coreThreads,
        maxThreads,
        keepAliveTime,
        TimeUnit.SECONDS,
        workQueue,
        newDaemonThreadFactory("s3a-transfer-shared-"));
    threadPoolExecutor.allowCoreThreadTimeOut(true);&lt;/PRE&gt;&lt;P&gt;The various arguments passed to the ThreadPoolExecutor are pulled from Hadoop configuration, such as the core-site.xml file.  The defaults for these are defined in &lt;A href="https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L780-L805"&gt;core-default.xml&lt;/A&gt;:&lt;/P&gt;&lt;PRE&gt;&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.max&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;256&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt; Maximum number of concurrent active (part)uploads,
    which each use a thread from the threadpool.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.core&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;15&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of core threads in the threadpool.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.threads.keepalivetime&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;60&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of seconds a thread can be idle before being
    terminated.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;fs.s3a.max.total.tasks&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;1000&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Number of (part)uploads allowed to the queue before
    blocking additional uploads.&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;&lt;/PRE&gt;&lt;P&gt;
Is it possible that you have overridden one of these configuration properties to an invalid value, such as a negative number?&lt;/P&gt;</description>
      <pubDate>Thu, 31 Dec 2015 02:23:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101026#M64011</guid>
      <dc:creator>cnauroth</dc:creator>
      <dc:date>2015-12-31T02:23:29Z</dc:date>
    </item>
    <item>
      <title>Re: I am getting below exception by using distcp to copy hdfs data into s3 using s3a protocol.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101027#M64012</link>
      <description>&lt;P&gt;yes it's working now. i have given fs.s3a.max.total.tasks value is 10 that's why it was throwing an exception&lt;/P&gt;&lt;P&gt;If possible ,  could you please reply the below query ?.&lt;/P&gt;&lt;P&gt;I have total 6 T.B (3*2 T.B) hard drives in each node. HDFS is using 5 T.B&lt;/P&gt;&lt;P&gt;i need to upload 5 T.B data into s3 Bucket.&lt;/P&gt;&lt;P&gt;i am using s3a client and i am getting "No Space Left in Device" ... !&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jan 2016 22:45:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-below-exception-by-using-distcp-to-copy-hdfs/m-p/101027#M64012</guid>
      <dc:creator>subhash_parise3</dc:creator>
      <dc:date>2016-01-05T22:45:32Z</dc:date>
    </item>
  </channel>
</rss>

