Member since
07-19-2017
53
Posts
3
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1996 | 08-23-2019 06:51 AM | |
3679 | 08-23-2019 06:45 AM | |
3317 | 08-20-2019 02:06 PM |
06-24-2021
04:06 AM
I'm seeing the same issue. I can see "Transition from state INITIALIZING to error state FATAL_ERROR" once I set "Use Transactions"="true" and "Delivery Guarantee"="Guarantee Replicated Delivery".
... View more
02-19-2020
04:44 PM
1 Kudo
@WilsonLozano,
As this thread is older and was marked 'Solved back in August of 2019 you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment, version of CDH, etc. that could aid others in providing a more accurate answer to your question.
... View more
01-06-2020
09:26 AM
Hi, As mentioned in the previous posts, did you tried increasing the memory and whether it solved the issue? Please let us know if you are still facing any issues? Thanks AKR
... View more
11-20-2019
12:57 PM
1 Kudo
This issue would really require further debugging. For whatever reason, at that particular time something happened with the user ID resolution. We've seen customers before that had similar issues when tools like SSSD is being used: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sssd-system-uids One idea here is to create a shell script that runs the command 'id ptz0srv0z50' and 'id -Gn ptz0srv0z50' in a loop based on some interval. say 10, 20 or 30 seconds and when the problem occurs just go over the output of that shell script and see if you notice anything different in the output at the time of the issue.
... View more
11-12-2019
04:45 PM
Hi w@leed Thanks for Replying. I did test the Job with all the three Collectors - ParallelGC, CMS and G1GC: I has tested following options with the G1GC: -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and with CMS: -XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+CMSConcurrentMTEnabled -XX:ParallelCMSThreads=10 -XX:ConcGCThreads=8 -XX:ParallelGCThreads=16 With G1GC defaults, I could see following: Desired survivor size 1041235968 bytes, new threshold 5 (max 15) [PSYoungGen: 1515304K->782022K(3053056K)] 2750361K->2017087K(6371840K), 1.5875321 secs] [Times: user=4.72 sys=0.74, real=1.59 secs] Heap after GC invocations=9 (full 3): PSYoungGen total 3053056K, used 782022K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000) eden space 2270720K, 0% used [0x0000000580000000,0x0000000580000000,0x000000060a980000) from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000) to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000) ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000) Metaspace used 55055K, capacity 55638K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K } {Heap before GC invocations=10 (full 3): PSYoungGen total 3053056K, used 3052742K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000) eden space 2270720K, 100% used [0x0000000580000000,0x000000060a980000,0x000000060a980000) from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000) to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000) ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000) Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K 42.412: [GC (Allocation Failure) Desired survivor size 1653080064 bytes, new threshold 4 (max 15) [PSYoungGen: 3052742K->1016800K(3422720K)] 4287807K->2985385K(6741504K), 4.0304873 secs] [Times: user=11.87 sys=1.77, real=4.03 secs] Heap after GC invocations=10 (full 3): PSYoungGen total 3422720K, used 1016800K [0x0000000580000000, 0x0000000727a80000, 0x0000000800000000) eden space 2405888K, 0% used [0x0000000580000000,0x0000000580000000,0x0000000612d80000) from space 1016832K, 99% used [0x0000000612d80000,0x0000000650e78240,0x0000000650e80000) to space 1614336K, 0% used [0x00000006c5200000,0x00000006c5200000,0x0000000727a80000) ParOldGen total 3318784K, used 1968584K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000) object space 3318784K, 59% used [0x0000000080000000,0x00000000f8272318,0x000000014a900000) Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K With all the Collectors only difference I could see was that, a delayed full GC. I am considering to changing the YoungGen now. Will update if I do see a difference. On a parallel note - 1. I did also see that there are some of the objects in the memory which remain persistent across GC cycles - for example : scala.Tuple2 and java.lang.Long 2. These are Java RDD's Regards
... View more
10-02-2019
07:23 PM
@ravikiran_sharm we've passed along your concerns and note of frustration to the relevant parties internally and they are actively working on your case. They say they are working with you directly to get this resolved.
... View more
08-23-2019
06:51 AM
@paleerbccm Briefly looking at the message, I would assume 'error_code=0' actually means that no errors occurred. It would need quite a bit of digging in the code to understand, but generally speaking, I wouldn't worry too much about TRACE level logs. Ideally, and especially that this is a production environment, you would normally set logging level to INFO and that's about all you would need. Unless you have an intimate knowledge of the code and you're chasing after a specific issue, it's rare that you would ever need TRACE level logs.
... View more
08-21-2019
11:28 AM
Thanks that does show more information. Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space). Regards
... View more
08-21-2019
06:10 AM
Now I am really clear about the situation. Thanks a lot for your replies.
... View more
08-20-2019
02:18 PM
Hi @sauravsuman689 A common issue that people have when using the kafka-consumer-group command line tool is that they do not set it up to communicate over Kerberos like any other Kafka client (i.e. consumers and producers). The security.protocol output you shared based on the cat command doesn't look right: cat /tmp/grouprop.properties
security.protocol=PLAINTEXTSASL This should instead be: security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka You can use the same instructions outlined in the following link starting with step number 5: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html#concept_lcn_4mm_s5 I understand you're using HDP but it should be pretty much the same steps. You will of course just use the same command line tool command you're using as opposed to the consumer command mentioned in the link: [kafka@XXX ~]$ /usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --bootstrap-server xxxx:6667,xxxx:6667,xxxx:6667 --list --command-config /tmp/grouprop.properties EDIT: It seems like HDP works a bit differently so your security.protocol parameter aligns with what the HDP platform would expect.
... View more