<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Bug in spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar, I know the fix, but unable to apply. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173438#M37632</link>
    <description>&lt;P&gt;You might have to distribute the new binary across the cluster. looks like you're hitting Spark-8032 which was committed after Spark 1.3.1, you should consider upgrading your cluster to latest HDP. &lt;A href="https://github.com/apache/spark/commit/22703dd79fecc844d68033358f3201fd8a8f95cb" target="_blank"&gt;https://github.com/apache/spark/commit/22703dd79fecc844d68033358f3201fd8a8f95cb&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 14 Aug 2016 00:40:56 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-08-14T00:40:56Z</dc:date>
    <item>
      <title>Bug in spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar, I know the fix, but unable to apply.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173437#M37631</link>
      <description>&lt;P&gt;We are using spark 1.3 on hdp2.2.4 and I found there is a bug in the spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar that ships with spark. the Mllib check for the version of numpy is incorrect and MLlib throws an exception.&lt;/P&gt;&lt;P&gt;I know the fix, I have to change the below file in the jar:&lt;/P&gt;&lt;PRE&gt;mllib/__init__.py"&lt;/PRE&gt;&lt;P&gt;below is the current code in the above mention python file: &lt;/P&gt;&lt;PRE&gt;import numpy 
if numpy.version.version &amp;lt; '1.4': 
raise Exception("MLlib requires NumPy 1.4+") &lt;/PRE&gt;&lt;P&gt;It can be fixed by changing to: &lt;/P&gt;&lt;PRE&gt;import numpy 
ver = [int(x) for x in numpy.version.version.split('.')[:2]] 
if ver &amp;lt; [1, 4]: 
raise Exception("MLlib requires NumPy 1.4+") &lt;/PRE&gt;&lt;P&gt;I have tried editing the 'spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar', to correct the code.&lt;/P&gt;&lt;PRE&gt; 
I un-zipped the jar file, fixed the code, re packed it using zip. &lt;/PRE&gt;&lt;P&gt;But after placing the fix, it gives EOF error: &lt;/P&gt;&lt;P&gt;Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. 
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, xxxxxx.xxxx.uk.hxxx): org.apache.spark.SparkException: 
Error from python worker: 
  &lt;STRONG&gt;/opt/anaconda/envs/sparkAnaconda/bin/python: No module named pyspark 
PYTHONPATH was: 
  /data/4/hadoop/yarn/local/usercache/xxxxxxxx/filecache/33/spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar 
java.io.EOFException 
  at java.io.DataInputStream.readInt(DataInputStream.java:392) 
  at &lt;/STRONG&gt;org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) 
  at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) 
  at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) 
  at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105) 
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 22:45:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173437#M37631</guid>
      <dc:creator>Rakesh Gupta</dc:creator>
      <dc:date>2016-08-11T22:45:33Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar, I know the fix, but unable to apply.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173438#M37632</link>
      <description>&lt;P&gt;You might have to distribute the new binary across the cluster. looks like you're hitting Spark-8032 which was committed after Spark 1.3.1, you should consider upgrading your cluster to latest HDP. &lt;A href="https://github.com/apache/spark/commit/22703dd79fecc844d68033358f3201fd8a8f95cb" target="_blank"&gt;https://github.com/apache/spark/commit/22703dd79fecc844d68033358f3201fd8a8f95cb&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Aug 2016 00:40:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173438#M37632</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-08-14T00:40:56Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar, I know the fix, but unable to apply.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173439#M37633</link>
      <description>&lt;P&gt;Thanks Artem, you are correct, but due to some constraints we can not wait until upgrade. I am unable to find a fix for this.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Aug 2016 04:51:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Bug-in-spark-assembly-1-3-1-2-3-0-0-2557-hadoop2-7-1-2-3-0-0/m-p/173439#M37633</guid>
      <dc:creator>Rakesh Gupta</dc:creator>
      <dc:date>2016-08-15T04:51:10Z</dc:date>
    </item>
  </channel>
</rss>

