Member since
11-13-2018
9
Posts
1
Kudos Received
0
Solutions
02-25-2019
03:12 PM
In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark ImportError: No module named numpy I am invoking pyspark using oozie. I tried to give this custom python library path in below approaches Using oozie tags <property> <name>oozie.launcher.mapreduce.map.env</name> <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value> </property> Using spark option tag <spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts> Nothing works. When i run plain python script it works fine. Problem is passing to pyspark Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7 When i print sys.path in my pyspark code it still gives me below default path [ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages'] Kindly give me any solution
... View more
Labels:
- Labels:
-
Apache Oozie
02-25-2019
03:07 PM
I am trying to connect Phoenix through pyspark. Everything fine, but the below error occures.
My Phoenix table "namespace:test" is available and access also good.
py4j.protocol.Py4JJavaError: An error occurred while calling o81.load.
: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=namespace:test
I am using below code
result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "\"namespace:test\"").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save()
I gave like this also. But it takes as Upper case table name as "Table undefined. tableName=NAMESPACE:TEST"
result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "namespace:test").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save()
I used same jdbc url using Java. It works fine
... View more
Labels:
- Labels:
-
Apache Phoenix
-
Apache Spark
02-15-2019
01:13 PM
1 Kudo
I am running spark-submit action in oozie. When i give spark.driver.extraClasspath or spark.executor.extraClasspath in spark-submit command it runs fine. But with oozie when i give those option in <spark-opts> tag, its not running. For example --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" Above run fine in spark-submit command but not in oozie. In oozie if i copy those jars inside workflow/lib, it works fine. even --file /etc/hbase/conf/hbase-site.xml also not working. I am passing hbase-site.xml from workflow/lib and its not the right way Then what is the point having spark.executor.extraClasspath option in oozie?
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
02-13-2019
02:07 PM
I found that "--files /etc/hbase/conf/hbase-site.xml" does not working when integrated with oozie. I pass the hbase-site.xml as below with file tag in oozie spark action. It works fine now <file>file:///etc/hbase/conf/hbase-site.xml</file>
... View more
02-11-2019
10:11 AM
Hi @Josh Elser Thank you so much for the answer. I checked what you said and everything fine. I am using below jdbc url as zkUrl when accessing phoenix. My cluster is kerberized cluster so I am passing all credentials properly as below jdbc:phoenix:ip-node1,ip-node2,ip-node3:2181:/hbase-secure:hbaseuser@HCL.COM:/home/hbaseuser/hbaseuser.keytab The problem is when i execute my pyspark with this jdbc url using spark-submit, it works fine. If i execute same code in oozie workflow its throwing below exception because of hbase connectivity issue java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException:Failed after attempts=36, exceptions:MonFeb1107:33:05 UTC 2019,null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68427: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101.us-west-2.compute.internal,16020,1545291237502, seqNum=0 How same code works fine in spark-submit and not in oozie workflow. I copied in all dependency jars in workflow/lib folder in hdfs. How to debug this further.
... View more
02-08-2019
02:58 PM
I am connecting and ingesting data into phoenix table using pyspark by below code dataframe.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "tablename").option("zkUrl", "localhost:2181").save() When i run this in spark submit it works fine by below command, spark-submit --master local --deploy-mode client --files /etc/hbase/conf/hbase-site.xml --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" sparkPhoenix.py When i run this with oozie I am getting below error, .ConnectionClosingException: Connection to ip-172-31-44-101.us-west-2.compute.internal/172.31.44.101:16020 is closing. Call id=9, waitTime=3 row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101 Below is workflow, <action name="pysparkAction" retry-max="1" retry-interval="1" cred="hbase">
<spark
xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local</master>
<mode>client</mode>
<name>Spark Example</name>
<jar>sparkPhoenix.py</jar>
<spark-opts>--py-files Leia.zip --files /etc/hbase/conf/hbase-site.xml --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar</spark-opts>
</spark>
<ok to="successEmailaction"/>
<error to="failEmailaction"/>
</action> Using spark-submit I got the same error I corrected that by passing required jars. In oozie, Even i pass jars, it throwing error.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
11-13-2018
06:32 PM
I have shell script like below ssh -q -v -i id_rsa -o "StrictHostKeyChecking no" user@remotemachine script > file hdfs dfs -put -f file hdfspath When I run this script in oozie shell action with "", file is copied from remote machine to my machine. Actually its more than 2kb file. But when i move it to hdfs using (hdfs dfs -put) command Its thrwing below error Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exception invoking main(), Output data exceeds its limit [2048] org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048]
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Oozie