Member since 
    
	
		
		
		06-23-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                136
            
            
                Posts
            
        
                8
            
            
                Kudos Received
            
        
                8
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3275 | 11-24-2017 08:17 PM | |
| 4041 | 07-28-2017 06:40 AM | |
| 1699 | 07-05-2017 04:32 PM | |
| 1959 | 05-11-2017 03:07 PM | |
| 6253 | 02-08-2017 02:49 PM | 
			
    
	
		
		
		05-07-2018
	
		
		08:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Duh! Of course it's a different machine! I'll check it in the morning. Thanks!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-07-2018
	
		
		02:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am trying to do the procedure here, but /usr/hdp/current/kafka-broker is a broken link and kafka-topics.sh is nowhere to be found. HDP shows Kafka service running OK.   TIA! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-06-2017
	
		
		10:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jay Kumar SenSharma  Thanks! Sorry forgot to say I am trying to run Spark 2.2 as an independent service that uses HDP2.6. I assum this won't work for it. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-06-2017
	
		
		10:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks! Unfortunately it already has that line. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-06-2017
	
		
		07:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 EDIT: I forgot to say I am trying to run Spark 2.2 as an independent service that uses HDP2.6.  Please help I am running out of time!  I run:  ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn     --deploy-mode cluster     --driver-memory 4g     --executor-memory 2g     --executor-cores 1     --queue thequeue     examples/jars/spark-examples*.jar     10 --executor-cores 4 --num-executors 11 --driver-java-options="-Dhdp.version=2.6.0.3-8"  --conf "spark.executor.extraJavaOptions=-Dhdp.version=2.6.0.3-8"  I get this error:  Spark YARN Cluster mode get this error “Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster”  I have tried all the fixes I can find except:  1. Classpath issues. Wher do I set this and to waht?  2. This question suggests it may be due to missing jars. Which jars do I need and what do I do with them?  TIA! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-05-2017
	
		
		05:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I've also tried the Dhdp.version= fixes from here. I've not put the new Spark on my other machines, could that be the problem, if so where do I put it? I created a new folder on master but if I use the same folder on the nodes, how does master knwo about it? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-05-2017
	
		
		07:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am trying to run Spark 2.2 with HDP 2.6. I stop Spark2 from Ambari, then I run:  /home/ed/spark2.2/spark-2.2.0-bin-hadoop2.7/bin/spark-shell --jars /home/ed/.ivy2/jars/stanford-corenlp-3.6.0-models.jar,/home/ed/.ivy2/jars/jersey-bundle-1.19.1.jar --packages databricks:spark-corenlp:0.2.0-s_2.11,edu.stanford.nlp:stanford-corenlp:3.6.0 \--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --executor-cores 2 --num-executors 11 --conf spark.hadoop.yarn.timeline-service.enabled=false  It used to run fine, then it started giving me:  Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.  now it just hangs after:  17/12/05 07:41:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.  I can run it OK, without --master yarn --deploy-mode client but then I get the driver only as executor.  I have tried spark.hadoop.yarn.timeline-service.enabled = true.  yarn.nodemanager.vmem-check-enabled and pmem are set to false.  Can anyone help or point me where to look for errors? TIA!  PS spark-defaults.conf:  spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18081
spark.yarn.historyServer.address master.royble.co.uk:18081
spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8
spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8
# spark.eventLog.dir hdfs:///spark-history
# spark.eventLog.enabled true
# spark.history.fs.logDirectory hdfs:///spark-history
# spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
# spark.history.ui.port 18080
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384
spark.yarn.historyServer.address spark-server:18081
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
spark.jars.packages com.databricks:spark-csv_2.11:1.4.0
spark.io.compression.codec lzf
spark.yarn.queue default
spark.blockManager.port 38000
spark.broadcast.port 38001
spark.driver.port 38002
spark.executor.port 38003
spark.fileserver.port 38004
spark.replClassServer.port 38005 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		11-29-2017
	
		
		09:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am getting desperate here! My Spark2 jobs take hours then get stuck!  I have a 4 node cluster each with 16GB RAM and 8 cores. I run HDP 2.6, Spark 2.1 and Zeppelin 0.7.  I have:  
 spark.executor.instances 11  spark.executor.cores 2  spark.executor.memory 4G  yarn.nodemanager.resource.memory-mb=14336  yarn.nodemanager.resource.cpu-vcores =7   Via Zeppelin (same notebook) I do an INSERT into a Hive table::  
 dfPredictions.write.mode(SaveMode.Append).insertInto("default.predictions")   for a 50 column table with about 12 million records.  This gets split into 3 stages of 75, 75 and 200 tasks. The 75 and 75 get stuck at stages 73 and 74 and the garbage collection lasts for hours. Any idea what I can try?  EDIT: I have not looked at tweaking partitions, can anyone give me pointers on how to do that, please? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		11-29-2017
	
		
		10:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Wow thanks. I'll try these tomorrow when my latest slow job finishes. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-29-2017
	
		
		09:45 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 My Hive query fails:  java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask  I cannot see any logs in the tez view. It looks like the parse stage is hiding to the right but I cannot access it? How do I look at it and where are the logs?  TIA!!  For some reason I cannot upload a jpg so there is a png here. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
        













