About selvaprabhu_k

selvaprabhu_k · ‎02-25-2019

In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark ImportError: No module named numpy I am invoking pyspark using oozie. I tried to give this custom python library path in below approaches Using oozie tags <property> <name>oozie.launcher.mapreduce.map.env</name> <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value> </property> Using spark option tag <spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts> Nothing works. When i run plain python script it works fine. Problem is passing to pyspark Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7 When i print sys.path in my pyspark code it still gives me below default path [ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages'] Kindly give me any solution

selvaprabhu_k · ‎02-25-2019

I am trying to connect Phoenix through pyspark. Everything fine, but the below error occures. My Phoenix table "namespace:test" is available and access also good. py4j.protocol.Py4JJavaError: An error occurred while calling o81.load. : org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=namespace:test I am using below code result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "\"namespace:test\"").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save() I gave like this also. But it takes as Upper case table name as "Table undefined. tableName=NAMESPACE:TEST" result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "namespace:test").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save() I used same jdbc url using Java. It works fine

selvaprabhu_k · ‎02-15-2019

I am running spark-submit action in oozie. When i give spark.driver.extraClasspath or spark.executor.extraClasspath in spark-submit command it runs fine. But with oozie when i give those option in <spark-opts> tag, its not running. For example --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" Above run fine in spark-submit command but not in oozie. In oozie if i copy those jars inside workflow/lib, it works fine. even --file /etc/hbase/conf/hbase-site.xml also not working. I am passing hbase-site.xml from workflow/lib and its not the right way Then what is the point having spark.executor.extraClasspath option in oozie?

selvaprabhu_k · ‎11-13-2018

I have shell script like below ssh -q -v -i id_rsa -o "StrictHostKeyChecking no" user@remotemachine script > file hdfs dfs -put -f file hdfspath When I run this script in oozie shell action with "", file is copied from remote machine to my machine. Actually its more than 2kb file. But when i move it to hdfs using (hdfs dfs -put) command Its thrwing below error Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exception invoking main(), Output data exceeds its limit [2048] org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048]

Online	Offline
Last Visited	‎03-11-2019 05:46 AM

Member Since	‎11-13-2018 12:15 PM
Last Visited	‎03-11-2019 05:46 AM
Posts	9
Kudos received	1

Cloudera Community

Add custom python library path to Pyspark code

org.apache.phoenix.schema.TableNotFoundException: ...

Oozie not taking spark extraclasspath jars when ru...

Oozie Shell action Output data exceeds its limit [...