Member since 
    
	
		
		
		07-19-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                53
            
            
                Posts
            
        
                3
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2258 | 08-23-2019 06:51 AM | |
| 5529 | 08-23-2019 06:45 AM | |
| 4142 | 08-20-2019 02:06 PM | 
			
    
	
		
		
		06-24-2021
	
		
		04:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I'm seeing the same issue.  I can see "Transition from state INITIALIZING to error state FATAL_ERROR" once I set "Use Transactions"="true" and "Delivery Guarantee"="Guarantee Replicated Delivery".  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-19-2020
	
		
		04:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @WilsonLozano, 
   
 As this thread is older and was marked 'Solved back in August of 2019 you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment, version of CDH, etc. that could aid others in providing a more accurate answer to your question.  
   
   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-06-2020
	
		
		09:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,     As mentioned in the previous posts, did you tried increasing the memory and whether it solved the issue?   Please let us know if you are still facing any issues?     Thanks  AKR 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-20-2019
	
		
		12:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This issue would really require further debugging. For whatever reason, at that particular time something happened with the user ID resolution. We've seen customers before that had similar issues when tools like SSSD is being used:  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sssd-system-uids    One idea here is to create a shell script that runs the command 'id ptz0srv0z50' and 'id -Gn ptz0srv0z50' in a loop based on some interval. say 10, 20 or 30 seconds and when the problem occurs just go over the output of that shell script and see if you notice anything different in the output at the time of the issue. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-12-2019
	
		
		04:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi w@leed      Thanks for Replying.     I did test the Job with all the three Collectors - ParallelGC, CMS and G1GC:     I has tested following options with the G1GC:        -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps        and with CMS:         -XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC  -XX:+CMSConcurrentMTEnabled -XX:ParallelCMSThreads=10 -XX:ConcGCThreads=8 -XX:ParallelGCThreads=16        With G1GC defaults, I could see following:         Desired survivor size 1041235968 bytes, new threshold 5 (max 15)  [PSYoungGen: 1515304K->782022K(3053056K)] 2750361K->2017087K(6371840K), 1.5875321 secs] [Times: user=4.72 sys=0.74, real=1.59 secs]  Heap after GC invocations=9 (full 3):  PSYoungGen total 3053056K, used 782022K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000)  eden space 2270720K, 0% used [0x0000000580000000,0x0000000580000000,0x000000060a980000)  from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000)  to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000)  ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)  object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000)  Metaspace used 55055K, capacity 55638K, committed 55896K, reserved 1097728K  class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K  }  {Heap before GC invocations=10 (full 3):  PSYoungGen total 3053056K, used 3052742K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000)  eden space 2270720K, 100% used [0x0000000580000000,0x000000060a980000,0x000000060a980000)  from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000)  to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000)  ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)  object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000)  Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K  class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K  42.412: [GC (Allocation Failure)  Desired survivor size 1653080064 bytes, new threshold 4 (max 15)  [PSYoungGen: 3052742K->1016800K(3422720K)] 4287807K->2985385K(6741504K), 4.0304873 secs] [Times: user=11.87 sys=1.77, real=4.03 secs]  Heap after GC invocations=10 (full 3):  PSYoungGen total 3422720K, used 1016800K [0x0000000580000000, 0x0000000727a80000, 0x0000000800000000)  eden space 2405888K, 0% used [0x0000000580000000,0x0000000580000000,0x0000000612d80000)  from space 1016832K, 99% used [0x0000000612d80000,0x0000000650e78240,0x0000000650e80000)  to space 1614336K, 0% used [0x00000006c5200000,0x00000006c5200000,0x0000000727a80000)  ParOldGen total 3318784K, used 1968584K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)  object space 3318784K, 59% used [0x0000000080000000,0x00000000f8272318,0x000000014a900000)  Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K  class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K         With all the Collectors only difference I could see was that, a delayed full GC.  I am considering to changing the YoungGen now. Will update if I do see a difference.     On a parallel note -  1. I did also see that there are some of the objects in the memory which remain persistent across GC cycles - for example : scala.Tuple2 and java.lang.Long   2. These are Java RDD's     Regards          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-02-2019
	
		
		07:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @ravikiran_sharm we've passed along your concerns and note of frustration to the relevant parties internally and they are actively working on your case. They say they are working with you directly to get this resolved. 
   
   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-23-2019
	
		
		06:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @paleerbccm     Briefly looking at the message, I would assume 'error_code=0' actually means that no errors occurred. It  would need quite a bit of digging in the code to understand, but generally speaking, I wouldn't worry too much about TRACE level logs.     Ideally, and especially that this is a production environment, you would normally set logging level to INFO and that's about all you would need. Unless you have an intimate knowledge of the code and you're chasing after a specific issue, it's rare that you would ever need TRACE level logs.        
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2019
	
		
		11:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks that does show more information.     Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).     Regards 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2019
	
		
		06:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Now I am really clear about the situation. Thanks a lot for your replies.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2019
	
		
		02:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @sauravsuman689     A common issue that people have when using the kafka-consumer-group command line tool is that they do not set it up to communicate over Kerberos like any other Kafka client (i.e. consumers and producers).    The security.protocol output you shared based on the cat command doesn't look right:        cat /tmp/grouprop.properties
security.protocol=PLAINTEXTSASL       This should instead be:         security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka       You can use the same instructions outlined in the following link starting with step number 5:  https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html#concept_lcn_4mm_s5    I understand you're using HDP but it should be pretty much the same steps. You will of course just use the same command line tool command you're using as opposed to the consumer command mentioned in the link:         [kafka@XXX ~]$ /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --bootstrap-server xxxx:6667,xxxx:6667,xxxx:6667 --list --command-config /tmp/grouprop.properties    EDIT:    It seems like HDP works a bit differently so your security.protocol parameter aligns with what the HDP platform would expect.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        











