About OsamaM

OsamaM · ‎12-03-2017

Thanks Michael that fixed my issue

OsamaM · ‎10-04-2017

Ok, I've fixed it as below and i'm not sure why it's only working when updating the file and not CM I run sudo gedit /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/conf/spark-env.sh and then added the below line at the very end of the script export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python running the test python code again and it worked fine

OsamaM · ‎10-04-2017

Hi All, I've downloaded QuickStart VM with CDH 5.10 on VirtualBox, everything went smoth but I've noticed that everyime I shutdown the machine and turn it back on later the IP Address for the host is keep chaning between 127.0.0.1 and the actual VM IP address 10.0.2.15, the VM has a static ip address, in my case it's 10.0.2.15 As a result I've to restart cloudera manager services and stale service as well and deploy client configuration again via cloudera manager. Is there's a way to keep the host IP address always pointing to 10.0.2.15 and not doing this flip. Thanks, Osama

OsamaM · ‎10-04-2017

Hi All, I'm using Cloudera Quick start vm with CDH 5.10 with 10GB ram running on VirtualBox. I've installed Anaconda using parcels as described here, everything went well and I've Anaconda version 4.2.0 installed. When I started pyspark from the terminal, it's loading with the below stack trace whcih tell me (as far as I understand) that everything is ok [cloudera@quickstart ~]$ pyspark WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark) overrides detected (/usr/lib/spark). WARNING: Running pyspark from user-defined location. Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 17/10/04 15:29:12 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth2) 17/10/04 15:29:12 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Python version 2.7.12 (default, Jul 2 2016 17:42:40) SparkContext available as sc, HiveContext available as sqlContext. Now I'm trying to run a small program to make sure it's working fine, so I've executed the below code: >>> intRdd = sc.parallelize([1, 2, 3, 4]) >>> intRdd.first() and .. dang .. i got the below excption 17/10/04 15:30:50 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.2.15, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 17/10/04 15:30:50 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/rdd.py", line 1315, in first rs = self.take(1) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/rdd.py", line 1297, in take res = self.context.runJob(self, takeUpToNumLeft, p) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 939, in runJob port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.0.2.15, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1644) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1603) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1592) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1840) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1853) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1866) at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:393) at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more I've googled this forum and found an excption similar to Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions and I found something similar here and here but unfortunately this didn't help solving my problem. I'm quiet new to Cloudera so if anyone can help me sorting out this issue with some detaild steps, that would be appreciated. Thanks, Osama

Online	Offline
Last Visited	‎03-21-2018 05:15 AM

Member Since	‎10-04-2017 03:31 PM
Last Visited	‎03-21-2018 05:15 AM
Posts	4
Kudos received	1

Cloudera Community

Re: Python in worker has different version than th...

Re: Quickstart vm IP address keep changing

Re: Python in worker has different version than th...

Quickstart vm IP address keep changing

Python in worker has different version than that i...