Member since 
    
	
		
		
		10-28-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                392
            
            
                Posts
            
        
                7
            
            
                Kudos Received
            
        
                20
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3467 | 03-12-2018 02:28 AM | |
| 5185 | 12-18-2017 11:41 PM | |
| 3639 | 07-17-2017 07:01 PM | |
| 2567 | 07-13-2017 07:20 PM | |
| 8210 | 07-12-2017 08:31 PM | 
			
    
	
		
		
		12-18-2017
	
		
		11:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i'm setting up Kafka Mirror Maker - to transfer data between 2 clusters, and somehow it seems the MirrorMaker is not able to pull data from the Producer cluster.  Here is the command used :  $CONFLUENT/bin/kafka-mirror-maker --consumer.config $CONFLUENT/mprops/mmConsumer-qa.properties --producer.config $CONFLUENT/mprops/mmProducer-qa.properties --whitelist="mmtest" --new.consumer --num.streams 4  mmConsumer-qa.properties :
bootstrap.servers=localhost:7092, localhost:7082,localhost:7072
group.id=mmtest
client.id=mm_consumer
mmProducer-qa.properties :
bootstrap.servers=localhost:8092
acks=1
batch.size=100
client.id=mm_producer
linger.ms=5
  Any ideas on how to debug/fix this ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Kafka
			
    
	
		
		
		12-14-2017
	
		
		11:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks, that helps !  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-14-2017
	
		
		09:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello - is there a way to get the YARN logs for an applicationId for a specific period ?  when i use command :  yarn logs application -applicationId <applicationId>  -log_files spark.log   .. it is not giving me the older log files (eg. 2 days old log files)  any way for to get this log file, w/o having to goto the consolidated yarn resource manager log files    btw, the yarn logs retention is set to 30 days in yarn-site.xml    Another question :  whet s the option -> -log_files used for ? what are the options i can provide for this ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache YARN
			
    
	
		
		
		12-12-2017
	
		
		07:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bkosaraju - .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-12-2017
	
		
		07:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 fyi .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-12-2017
	
		
		06:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bkosaraju - this is the query fired ..   select deviceid, devicename, indicatorname, topic_k, partition_k, offset_k from powerpoll where year=2017 and month=12 and day=11 limit 5;  There is no column called format, can you pls. clarify what you meant ?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-12-2017
	
		
		12:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bkosaraju, @mqureshi - any ideas on this ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-12-2017
	
		
		12:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hello - i'm getting error in querying Hive table (data in Parquet format) :  select * from powerpoll_k1 where year=2017 and month=12 and day=11 limit 5;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 102.0 failed 4 times, most recent failure: Lost task 0.3 in stage 102.0 (TID 20602, msc02-jag-dn-016.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)  Table DDL is shown below :  CREATE EXTERNAL TABLE `powerpoll_k1`(`topic_k` varchar(255), `partition_k` int, `offset_k` bigint, `timestamp_k` timestamp, `deviceid` bigint, `devicename` varchar(50), `deviceip` varchar(128), `peerid` int, `objectid` int, `objectname` varchar(256), `objectdesc` varchar(256), `oid` varchar(50), `pduoutlet` varchar(50), `pluginid` int, `pluginname` varchar(255), `indicatorid` int, `indicatorname` varchar(255), `format` int, `snmppollvalue` varchar(128) COMMENT 'value in kafka avsc', `time` double, `clustername` varchar(50) COMMENT 'rpp or power', `peerip` varchar(50))COMMENT 'external table at /apps/hive/warehouse/amp.db/sevone_power'PARTITIONED BY (`year` int, `month` int, `day` int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'WITH SERDEPROPERTIES ('serialization.format' = '1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION 'hdfs://application/apps/hive/warehouse/amp.db/power'TBLPROPERTIES ('transient_lastDdlTime' = '1513022286')  any ideas on what the issue is ?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		12-11-2017
	
		
		11:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@bkosaraju i'm getting following error in querying the table, any ideas ?  0: jdbc:hive2://msc02-jag-hve-002.uat.gdcs.ap> select deviceid, devicename, indicatorname, topic_k, partition_k, offset_k from powerpoll where year=2017 and month=12 and day=11 limit 5;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 86.0 failed 4 times, most recent failure: Lost task 0.3 in stage 86.0 (TID 19049, msc02-jag-dn-011.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-11-2017
	
		
		10:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks @bkosaraju - that worked ! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        


