Member since 
    
	
		
		
		10-07-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                107
            
            
                Posts
            
        
                73
            
            
                Kudos Received
            
        
                23
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3218 | 02-23-2017 04:57 PM | |
| 2558 | 12-08-2016 09:55 AM | |
| 10034 | 11-24-2016 07:24 PM | |
| 4845 | 11-24-2016 02:17 PM | |
| 10303 | 11-24-2016 09:50 AM | 
			
    
	
		
		
		08-04-2016
	
		
		05:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Assume you have a ORC table "test" in hive that fits to the csv file "test.csv"  SparkSQL  sqlContext.read.format("com.databricks.spark.csv")
          .option("header", "true")
          .option("delimiter", ",")
          .load("/tmp/test.csv")
          .insertInto("test") 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-04-2016
	
		
		04:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Does the Ambari Server see all virtual machines on the other machine, e.g. are they in the same network and is the Ambari server machine able to resolve the hostnames of the other machine?   If so can root from Ambari server machine log into the virtual machines on the other machine wíthout password?  These are a few things that need to happen during registration 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-04-2016
	
		
		04:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Assume you have a file /tmp/test.csv" like
  Col1|Col2|Col3|Col4
12|34|"56|78"|9A
"AB"|"CD"|EF|"GH:"|:"IJ"
  If I load it with Spark I get  val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true")
                   .option("delimiter", "|").option("escape", ":").load("/tmp/test.csv")
df.show()
+----+----+-----+-------+
|Col1|Col2| Col3|   Col4|
+----+----+-----+-------+
|  12|  34|56|78|     9A|
|  AB|  CD|   EF|GH"|"IJ|
+----+----+-----+-------+  So the example contains delimiter in quotes and escaped quotes. I use ":" to escape quotes, you can many other characters (don't use e.g. "#")   Is this something you want to achieve? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2016
	
		
		07:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Example from the Spark doc page (http://spark.apache.org/docs/latest/submitting-applications.html) 
 # Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \  
--class org.apache.spark.examples.SparkPi \  
--master spark://207.184.161.138:7077 \  
--deploy-mode cluster \  
--supervise \  
--executor-memory 20G \  
--total-executor-cores 100 \  
/path/to/examples.jar \
1000  executor-memory is what you want to adapt 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-18-2016
	
		
		09:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Have you tried to avoid folders with empty files?  As an idea, instead of using  <DStream>
.saveAsTextFiles("/tmp/results/ts", "json");  (which creates folders with empty files if nothing gets streamed from the source), I tried  <DStream>
.foreachRDD(rdd => {
  try {
    val f = rdd.first() // fails for empty RDDs
    rdd.saveAsTextFile(s"/tmp/results/ts-${System.currentTimeMillis}.json")
  } catch {
    case e:Exception => println("empty rdd")
  }
})  It seems to work for me. No Folders with empty files any more. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-15-2016
	
		
		11:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This might help: https://community.hortonworks.com/questions/30288/oozie-spark-action-on-hdp-24-nosuchmethoderror-org.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-15-2016
	
		
		11:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It looks like you are executing the job as user hadoop, However spark wants to execute staging data from/user/yarn (which can only be accessed by yarn). How did you start the job and with which user?  I am surprised that spark uses /user/yarn as staging dir for user hadoop. Is there any staging dir configuration in your system (SPARK_YARN_STAGING_DIR)? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-14-2016
	
		
		07:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 
	I don't know where the TFS bit comes from, maybe some dependency problems.  
	For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as 
 spark-submit --master yarn-cluster \ 
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar
  If this works, the jar with dependencies should be the one in the oozie spark action. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-13-2016
	
		
		04:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I installed it manually, it was quite straightforward. However you need maven 3.3, else some npm stuff will fail.  I just did "mvn clean package -DskipTests"  I then copied conf/zeppelin-env.sh.template to conf/zeppelin-env.sh and added  export JAVA_HOME=/usr/jdk64/jdk1.8.0_60/
export SPARK_HOME=/usr/hdp/current/spark-client
export HADOOP_HOME=/usr/hdp/current/hadoop-client  and copied zeppelin-site.xml.template to zeppelin-site.xml and changed port to 9995  Plus in Zeppelin for the Spark interpreter I changed the "master" property to yarn-client.  Seems to work for me on a HDP 2.4.2 cluster 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













