About w@leed

Rex13 · ‎06-24-2021

I'm seeing the same issue. I can see "Transition from state INITIALIZING to error state FATAL_ERROR" once I set "Use Transactions"="true" and "Delivery Guarantee"="Guarantee Replicated Delivery".

ask_bill_brooks · ‎02-19-2020

@WilsonLozano, As this thread is older and was marked 'Solved back in August of 2019 you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment, version of CDH, etc. that could aid others in providing a more accurate answer to your question.

AKR · ‎01-06-2020

Hi, As mentioned in the previous posts, did you tried increasing the memory and whether it solved the issue? Please let us know if you are still facing any issues? Thanks AKR

w@leed · ‎11-20-2019

This issue would really require further debugging. For whatever reason, at that particular time something happened with the user ID resolution. We've seen customers before that had similar issues when tools like SSSD is being used: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sssd-system-uids One idea here is to create a shell script that runs the command 'id ptz0srv0z50' and 'id -Gn ptz0srv0z50' in a loop based on some interval. say 10, 20 or 30 seconds and when the problem occurs just go over the output of that shell script and see if you notice anything different in the output at the time of the issue.

PARTOMIA09 · ‎11-12-2019

Hi w@leed Thanks for Replying. I did test the Job with all the three Collectors - ParallelGC, CMS and G1GC: I has tested following options with the G1GC: -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and with CMS: -XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+CMSConcurrentMTEnabled -XX:ParallelCMSThreads=10 -XX:ConcGCThreads=8 -XX:ParallelGCThreads=16 With G1GC defaults, I could see following: Desired survivor size 1041235968 bytes, new threshold 5 (max 15) [PSYoungGen: 1515304K->782022K(3053056K)] 2750361K->2017087K(6371840K), 1.5875321 secs] [Times: user=4.72 sys=0.74, real=1.59 secs] Heap after GC invocations=9 (full 3): PSYoungGen total 3053056K, used 782022K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000) eden space 2270720K, 0% used [0x0000000580000000,0x0000000580000000,0x000000060a980000) from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000) to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000) ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000) Metaspace used 55055K, capacity 55638K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K } {Heap before GC invocations=10 (full 3): PSYoungGen total 3053056K, used 3052742K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000) eden space 2270720K, 100% used [0x0000000580000000,0x000000060a980000,0x000000060a980000) from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000) to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000) ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000) Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K 42.412: [GC (Allocation Failure) Desired survivor size 1653080064 bytes, new threshold 4 (max 15) [PSYoungGen: 3052742K->1016800K(3422720K)] 4287807K->2985385K(6741504K), 4.0304873 secs] [Times: user=11.87 sys=1.77, real=4.03 secs] Heap after GC invocations=10 (full 3): PSYoungGen total 3422720K, used 1016800K [0x0000000580000000, 0x0000000727a80000, 0x0000000800000000) eden space 2405888K, 0% used [0x0000000580000000,0x0000000580000000,0x0000000612d80000) from space 1016832K, 99% used [0x0000000612d80000,0x0000000650e78240,0x0000000650e80000) to space 1614336K, 0% used [0x00000006c5200000,0x00000006c5200000,0x0000000727a80000) ParOldGen total 3318784K, used 1968584K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 59% used [0x0000000080000000,0x00000000f8272318,0x000000014a900000) Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K With all the Collectors only difference I could see was that, a delayed full GC. I am considering to changing the YoungGen now. Will update if I do see a difference. On a parallel note - 1. I did also see that there are some of the objects in the memory which remain persistent across GC cycles - for example : scala.Tuple2 and java.lang.Long 2. These are Java RDD's Regards

ask_bill_brooks · ‎10-02-2019

@ravikiran_sharm we've passed along your concerns and note of frustration to the relevant parties internally and they are actively working on your case. They say they are working with you directly to get this resolved.

w@leed · ‎08-23-2019

@paleerbccm Briefly looking at the message, I would assume 'error_code=0' actually means that no errors occurred. It would need quite a bit of digging in the code to understand, but generally speaking, I wouldn't worry too much about TRACE level logs. Ideally, and especially that this is a production environment, you would normally set logging level to INFO and that's about all you would need. Unless you have an intimate knowledge of the code and you're chasing after a specific issue, it's rare that you would ever need TRACE level logs.

Prav · ‎08-21-2019

Thanks that does show more information. Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space). Regards

iamabug · ‎08-21-2019

Now I am really clear about the situation. Thanks a lot for your replies.

w@leed · ‎08-20-2019

Hi @sauravsuman689 A common issue that people have when using the kafka-consumer-group command line tool is that they do not set it up to communicate over Kerberos like any other Kafka client (i.e. consumers and producers). The security.protocol output you shared based on the cat command doesn't look right: cat /tmp/grouprop.properties security.protocol=PLAINTEXTSASL This should instead be: security.protocol=SASL_PLAINTEXT sasl.kerberos.service.name=kafka You can use the same instructions outlined in the following link starting with step number 5: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html#concept_lcn_4mm_s5 I understand you're using HDP but it should be pretty much the same steps. You will of course just use the same command line tool command you're using as opposed to the consumer command mentioned in the link: [kafka@XXX ~]$ /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --bootstrap-server xxxx:6667,xxxx:6667,xxxx:6667 --list --command-config /tmp/grouprop.properties EDIT: It seems like HDP works a bit differently so your security.protocol parameter aligns with what the HDP platform would expect.

Online	Offline
Last Visited	‎09-29-2020 02:13 PM

Member Since	‎07-19-2017 08:54 AM
Last Visited	‎09-29-2020 02:13 PM
Posts	53
Kudos received	3

Cloudera Community

Re: Controller 1005 epoch 84 received response {er...

Re: Topic creation and deletion are not protected ...

Re: Is it possible to have two versions of Kafka i...

Re: Nifi Transaction with Kafka (exactly-once)

Re: Topic creation and deletion are not protected ...

Re: Spark job getting failed with Jupyter notebook

Re: Getting user not found issue when starting spa...

Re: Spark Job long GC pauses

Re: Did not receive Hortonworks Spark Certificatio...

Re: Controller 1005 epoch 84 received response {er...

Re: Spark job fails without much information

Re: Is it possible to have two versions of Kafka i...

Re: Kafka consumer groups are not getting listed ...