Member since
07-21-2016
2
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16992 | 07-26-2016 01:49 PM |
07-26-2016
01:49 PM
2 Kudos
Hi @Artem Ervits , @Michael Young, thanks for your replies, after more investigation we found that the issue mentioned is
not critical for our results : it seems to be raised by the HDP
underlying stack and only pollute our logs. We found our correct
results in the mess and could continue our devs. PYSPARK_PYTHON, LD_LIBRARY_PATH are correctely set. We found a
problem with PYTHONHASHSEED, but corrected by setting it with a
value. So, I could mark thread as resolved, but how could you explain
the typical Pyhton version error (the 'Syntax Error' on 'print'
without parenthesis in hdp-select code) from HDP stack code ?
Could it be some adherence from HDP into Spark / Yarn
with other HDP stack modules that break Python 3 compatibility ?
... View more
07-21-2016
01:59 PM
Hello, We have a cluster with HDP 2.4.2.0 and we face an issue when running a Python 3 script that use spark-submit in spark-client mode.
When Spark is activated a Python exeception is raised on the hdp-select and whe could deduce that is a Python version 2 vs version 3 problem. And subsequent question, is there any trick or a rigth way to have Python 3 scripts with pyspark in HDP ? See with the following trace : File "/usr/bin/hdp-select", line 202
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to 'print' [...] WARN ScriptBasedMapping: Exception running /etc/hadoop/conf/topology_script.py 172.28.15.90
ExitCodeException exitCode=1: File "/etc/hadoop/conf/topology_script.py", line 62
print rack
^
SyntaxError: Missing parentheses in call to 'print'
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:292)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:284)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:284)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:196)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels: