Member since 
    
	
		
		
		09-25-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                230
            
            
                Posts
            
        
                276
            
            
                Kudos Received
            
        
                39
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 28054 | 07-05-2016 01:19 PM | |
| 10607 | 04-01-2016 02:16 PM | |
| 2907 | 02-17-2016 11:54 AM | |
| 7370 | 02-17-2016 11:50 AM | |
| 14726 | 02-16-2016 02:08 AM | 
			
    
	
		
		
		11-15-2015
	
		
		01:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Easiest way to change number of mappers to desired number is:  set tez.grouping.split-count = YOUR-NUMBER-OF-TASKS;  As pointed by Andrew Grande, documented here: https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-13-2015
	
		
		03:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Scott Shaw, @Sourygna Luangsay  I created a "minimum-viable-serde" implementing what you described. See if it is what you need.   PS: I'm assuming your last column will be a map<string,string>, I haven't done data type handling for last column yet. For the key columns, it will respect the data type you declare when creating table.  from shell:  wget https://github.com/gbraccialli/HiveUtils/raw/master/target/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar -O /tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar
echo "a,b,c,adsfa,adfa" > /tmp/testserde.txt
echo "1,2,3,asdfasdf,sdfasd" >> /tmp/testserde.txt
echo "4,5,6,adfas,adf,d" >> /tmp/testserde.txt
hadoop fs -mkdir /tmp/testserde/
hadoop fs -put -f /tmp/testserde.txt /tmp/testserde/
hive  from hive:  add jar /tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar;
drop table testserde;
create external table testserde (
 field1 string,
 field2 int,
 field3 double,
 maps map<string,string>
)
ROW FORMAT SERDE 'com.github.gbraccialli.hive.serde.NKeys_MapKeyValue'
WITH SERDEPROPERTIES (
 "delimiter" = ","
)
LOCATION '/tmp/testserde/';
select * from testserde;
  Source code is here:  https://github.com/gbraccialli/HiveUtils  https://github.com/gbraccialli/HiveUtils/blob/master/src/main/java/com/github/gbraccialli/hive/serde/NKeys_MapKeyValue.java  PS2: there are lots of TODO yet. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-12-2015
	
		
		07:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @hrongali@hortonworks.com I think a hive UDF could implement same logic, but would be easier to consume than map-reduce program. I think this UDF from brickhouse do this:  https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/hbase/CachedGetUDF.java 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		06:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @azeltov@hortonworks.com  I think issues are:  1- you have to use file:// for local files  2- using pyspark, you have to use print before  see example below (working for me):  %pyspark 
base_rdd = sc.textFile("file:///usr/hdp/current/spark-client/data/mllib/sample_libsvm_data.txt")
print base_rdd.count()
print base_rdd.take(3)   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 4- Execute sql, using sql interpreter  %sql
select geohash_encode(1.11,1.11,3) from sample_07 limit 10
  It fails with sql interpreter + zeppelin:  java.lang.ClassNotFoundException: com.github.gbraccialli.hive.udf.UDFGeohashEncode
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
... 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Second, the same with zeppelin:  1- Restart interpreter  2- Load dependencies  %dep 
z.reset()
z.load("com.github.gbraccialli:HiveUtils:1.0-SNAPSHOT")
  3- Execute sql, using same scale code   val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
sqlContext.sql("""create temporary function geohash_encode as 'com.github.gbraccialli.hive.udf.UDFGeohashEncode'""");
sqlContext.sql("""select geohash_encode(1.11,1.11,3) from sample_07 limit 10""").collect().foreach(println);
  It worked with scale code + zeppelin!!!!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 2- Run spark-shell with dependency  spark-shell --master yarn-client --packages "com.github.gbraccialli:HiveUtils:1.0-SNAPSHOT"  3- Run spark code  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
sqlContext.sql("""create temporary function geohash_encode as 'com.github.gbraccialli.hive.udf.UDFGeohashEncode'""");
sqlContext.sql("""select geohash_encode(1.11,1.11,3) from sample_07 limit 10""").collect().foreach(println);
  spark-shell worked fine! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @azeltov@hortonworks.com  See what I tried.  First, with spark-shell  1- Download jar and register to local maven  su - zeppelin
wget https://raw.githubusercontent.com/gbraccialli/HiveUtils/master/target/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar /tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar
mvn org.apache.maven.plugins:maven-install-plugin:2.5.2:install-file \
 -Dfile=/tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar \
 -DgroupId=com.github.gbraccialli \
 -DartifactId=HiveUtils \
 -Dversion=1.0-SNAPSHOT \
 -Dpackaging=jar  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Neeraj I needed to add credential to hive-site to wasb to work inside hive. Did it work for you only with hdfs-site? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		02:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 see this jira: https://issues.apache.org/jira/browse/ZEPPELIN-150 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













