Member since 
    
	
		
		
		04-25-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                579
            
            
                Posts
            
        
                609
            
            
                Kudos Received
            
        
                111
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2922 | 02-12-2020 03:17 PM | |
| 2136 | 08-10-2017 09:42 AM | |
| 12470 | 07-28-2017 03:57 AM | |
| 3407 | 07-19-2017 02:43 AM | |
| 2520 | 07-13-2017 11:42 AM | 
			
    
	
		
		
		06-02-2016
	
		
		11:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 numpy is missing here,install numpy using pip install numpy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		04:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 is there any zk client is running on your local machine which is trying to connect to zk-server running on vm? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		04:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 all these value are getting picked up from your env variable http://grepcode.com/file/repo1.maven.org/maven2/org.apache.zookeeper/zookeeper/3.3.1/org/apache/zookeeper/Environment.java#Environment   please check your env variable  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		12:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 if you commit the offset based on timestamp you can start consuming from kafka at next batch cycle  like this  --commit last consumed
consumer = KafkaConsumer(bootstrap_servers='localhost:9092')   tp = TopicPartition(topic, partition) 
consumer.seek(tp, end) 
consumer.commit() 
--now start consuming offset from Kafka when the job restarts at the next batch cycle: 
consumer.assign([tp]) 
  start = consumer.committed(tp) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		11:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 with "transactional"="true" you are not able to compile this DDL statement, transactional table wont allow sorted column, are you able to successfully execute this statement? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		10:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 @Sanjeev Verma  you can use following ways to get the external configuration inside the topplogy  
 
 
 
 1: 
 pass the arguments like this 
 storm jar storm-jar topology-name -c sKey=sValue -c key1=value1 -c key2=value2 >/tmp/storm.txt 
 2: 
 Create a simple java resource file (properties files) and pass it as arguments to your topology main class, in main method read the properties from the main file  
 and build the storm configuration object using conf.put() 
 3: 
 create separate yaml file read it through the Utils method provided by storm api,look for more documentation https://nathanmarz.github.io/storm/doc/backtype/storm/utils/Utils.html 
 Utils.findAndReadConfigFile()  
 
  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		10:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 as you are using Transactional Table you can not take advantage of sort by on fechaoprcnf column.Apart from partitioning try to create storage index on the table using tblproperties ("orc.create.index"="true","orc.compress"="ZLIB", "orc.stripe.size"="268435456", "orc.row.index.stride"="10000")  -- orc stripe and index.stride value in this case are default try to tune these value and compare performace results. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		09:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 considering you are using orc table,if you are not using ACID table it will be good if you can modify the table DDL clustered by (codnrbeenf) sorted by (fechaoprcnf).  further to this you can create storage based index on orc table by specifying orc.create.index=true. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		05:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 @akeezhadath it seems that you are not calling action which actually don't trigger the job. spark actions are lazily evaluted ,can you run some terminal operation on the filterwords like count or collect and see if you are able to see the incremented value of accumulators. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-31-2016
	
		
		05:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 looking at this exception  java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.setHadoopSessionContext(Ljava/lang/String;)V  it seems that there is wrong version of HadoopShims jar is available in your classpath which dont have setHadoopSessionContext implementation in it or it has different method signature.  to troubleshoot this problem     lsof -p <HS2 process id> | grep -i jar |awk  '{ print $9 }' > class-jar.txt    for jar in `cat class-jar.txt` ;  do echo "$jar" ; jar -tvf "$jar" | grep --color 'org.apache.hadoop.hive.shims.HadoopShims' ; done
     look out the jars(there could me multiple shim jar available) which contains this class and then extract this class from jar  for each jar which contains HadoopShims  do  jar xvf <jar> org/apache/hadoop/hive/shims/HadoopShims  run   javap org.apache.hadoop.hive.shims.HadoopShims to verify the method setHadoopSessionContext availbility and method signature  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













