<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: We are experiencing an intermittent issue with our Spark load jobs. We use a python to launch multiple Spark Submit jobs which loads data from source files into HDFS. We noticed these Spark submit jobs fails intermittently. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-are-experiencing-an-intermittent-issue-with-our-Spark/m-p/105887#M54519</link>
    <description>&lt;P&gt;ok, found the problem. Strange but that's what fixed the issue. set mapreduce.input.fileinputformat.split.minsize in mapred config to 64mb (bigger than any json file we have) and this resolved the issue. Seems like json file was getting split , that caused the problem. &lt;/P&gt;</description>
    <pubDate>Mon, 20 Feb 2017 19:20:34 GMT</pubDate>
    <dc:creator>amitrai2012</dc:creator>
    <dc:date>2017-02-20T19:20:34Z</dc:date>
    <item>
      <title>We are experiencing an intermittent issue with our Spark load jobs. We use a python to launch multiple Spark Submit jobs which loads data from source files into HDFS. We noticed these Spark submit jobs fails intermittently.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-are-experiencing-an-intermittent-issue-with-our-Spark/m-p/105886#M54518</link>
      <description>&lt;P&gt;For example, the python launches a number of Spark submit 
jobs, some of them would fail with this exception. If we re-run the 
framework to re-launch the failed jobs, some of them may fail again but 
some of them may succeed. If we keep re-running the failed jobs 
eventually all of them succeed, this issue is intermittent.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;error message in yarn application log:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;17/02/13 22:38:55 ERROR yarn.ApplicationMaster: User class threw exception: com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of com.mgl.dh.silfspark.configs.FileCnfg out of VALUE_STRING token&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;error in yarn node manager log:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; 2017-02-13 22:38:58,131 WARN  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:launchContainer(313)) - Exception from container-launch with container ID: container_1486736461        720_22516_01_000084 and exit code: 15
   ExitCodeException exitCode=15:
           at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
           at org.apache.hadoop.util.Shell.run(Shell.java:456)
           at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
           at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:297)
           at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
           at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Environment:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;HDFS 2.7.1 &lt;/P&gt;&lt;P&gt;YARN 2.7.1 &lt;/P&gt;&lt;P&gt;Spark 1.5.1&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 16:04:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-are-experiencing-an-intermittent-issue-with-our-Spark/m-p/105886#M54518</guid>
      <dc:creator>amitrai2012</dc:creator>
      <dc:date>2017-02-15T16:04:16Z</dc:date>
    </item>
    <item>
      <title>Re: We are experiencing an intermittent issue with our Spark load jobs. We use a python to launch multiple Spark Submit jobs which loads data from source files into HDFS. We noticed these Spark submit jobs fails intermittently.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-are-experiencing-an-intermittent-issue-with-our-Spark/m-p/105887#M54519</link>
      <description>&lt;P&gt;ok, found the problem. Strange but that's what fixed the issue. set mapreduce.input.fileinputformat.split.minsize in mapred config to 64mb (bigger than any json file we have) and this resolved the issue. Seems like json file was getting split , that caused the problem. &lt;/P&gt;</description>
      <pubDate>Mon, 20 Feb 2017 19:20:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-are-experiencing-an-intermittent-issue-with-our-Spark/m-p/105887#M54519</guid>
      <dc:creator>amitrai2012</dc:creator>
      <dc:date>2017-02-20T19:20:34Z</dc:date>
    </item>
  </channel>
</rss>

