Member since 
    
	
		
		
		11-13-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                9
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		02-25-2019
	
		
		03:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark  ImportError: No module named numpy  I am invoking pyspark using oozie.  I tried to give this custom python library path in below approaches  Using oozie tags  <property>       <name>oozie.launcher.mapreduce.map.env</name>       <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value>     </property>  Using spark option tag  <spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts>      Nothing works.  When i run plain python script it works fine. Problem is passing to pyspark  Even i gave this in pyspark header also as  #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7   When i print sys.path in my pyspark code it still gives me below default path  [ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages']  Kindly give me any solution 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Oozie
			
    
	
		
		
		02-25-2019
	
		
		03:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am trying to connect Phoenix through pyspark. Everything fine, but the below error occures. 
 My Phoenix table "namespace:test" is available and access also good. 
 py4j.protocol.Py4JJavaError: An error occurred while calling o81.load.
: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=namespace:test 
   
 I am using below code 
 result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "\"namespace:test\"").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save() 
   
 I gave like this also. But it takes as Upper case table name as "Table undefined. tableName=NAMESPACE:TEST" 
 result.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "namespace:test").option("zkUrl", "jdbc:phoenix:ip-172-31-45-15.us-west-2.compute.internal:2181:/hbase-secure:hbaseuser@ex.COM:/home/hbaseuser/hbaseuser.keytab").save() 
   
 I used same jdbc url using Java. It works fine 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Phoenix
- 
						
							
		
			Apache Spark
			
    
	
		
		
		02-15-2019
	
		
		01:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I am running spark-submit action in oozie. When i give spark.driver.extraClasspath or spark.executor.extraClasspath in spark-submit command it runs fine. But with oozie when i give those option in <spark-opts> tag, its not running.  For example  --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"  Above run fine in spark-submit command but not in oozie. In oozie if i copy those jars inside workflow/lib, it works fine.  even --file /etc/hbase/conf/hbase-site.xml also not working. I am passing hbase-site.xml from workflow/lib and its not the right way  Then what is the point having spark.executor.extraClasspath option in oozie? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Oozie
- 
						
							
		
			Apache Spark
			
    
	
		
		
		11-13-2018
	
		
		06:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have shell script like below    ssh -q -v -i id_rsa -o "StrictHostKeyChecking no" user@remotemachine script > file  hdfs dfs -put -f file hdfspath    When I run this script in oozie shell action with "", file is copied from remote machine to my machine. Actually its more than 2kb file. But when i move it to hdfs using (hdfs dfs -put) command Its thrwing below error  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exception invoking main(), Output data exceeds its limit [2048] org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048] 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache Oozie
 
        







