Member since
11-13-2018
9
Posts
1
Kudos Received
0
Solutions
02-25-2019
03:12 PM
In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark ImportError: No module named numpy I am invoking pyspark using oozie. I tried to give this custom python library path in below approaches Using oozie tags <property> <name>oozie.launcher.mapreduce.map.env</name> <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value> </property> Using spark option tag <spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts> Nothing works. When i run plain python script it works fine. Problem is passing to pyspark Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7 When i print sys.path in my pyspark code it still gives me below default path [ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages'] Kindly give me any solution
... View more
Labels:
- Labels:
-
Apache Oozie
02-25-2019
03:07 PM
I am trying to connect Phoenix through pyspark. Everything fine, but the below error occures.
My Phoenix table "namespace:test" is available and access also good.
py4j.protocol.Py4JJavaError: An error occurred while calling o81.load.
: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=namespace:test
I am using below code
result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "\"namespace:test\"").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save()
I gave like this also. But it takes as Upper case table name as "Table undefined. tableName=NAMESPACE:TEST"
result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "namespace:test").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save()
I used same jdbc url using Java. It works fine
... View more
Labels:
- Labels:
-
Apache Phoenix
-
Apache Spark
02-15-2019
01:13 PM
1 Kudo
I am running spark-submit action in oozie. When i give spark.driver.extraClasspath or spark.executor.extraClasspath in spark-submit command it runs fine. But with oozie when i give those option in <spark-opts> tag, its not running. For example --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" Above run fine in spark-submit command but not in oozie. In oozie if i copy those jars inside workflow/lib, it works fine. even --file /etc/hbase/conf/hbase-site.xml also not working. I am passing hbase-site.xml from workflow/lib and its not the right way Then what is the point having spark.executor.extraClasspath option in oozie?
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
11-13-2018
06:32 PM
I have shell script like below ssh -q -v -i id_rsa -o "StrictHostKeyChecking no" user@remotemachine script > file hdfs dfs -put -f file hdfspath When I run this script in oozie shell action with "", file is copied from remote machine to my machine. Actually its more than 2kb file. But when i move it to hdfs using (hdfs dfs -put) command Its thrwing below error Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exception invoking main(), Output data exceeds its limit [2048] org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048]
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Oozie