Member since
02-18-2016
141
Posts
19
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5282 | 12-18-2019 07:44 PM | |
5312 | 12-15-2019 07:40 PM | |
1859 | 12-03-2019 06:29 AM | |
1878 | 12-02-2019 06:47 AM | |
5997 | 11-28-2019 02:06 AM |
11-14-2019
08:53 PM
@bdelpizzo Can you see any error in kafka logs/mirror maker logs? It might be possible that the mirror maker is not able to process messages, because of size of message. If the size of any message is high than configured/default value then it might stuck in queue. Check for message.max.bytes property
... View more
11-14-2019
08:17 PM
@deekshant To debug Namenode issue you need to check below - 1. Check active namenode[NN] logs [for time when it got reboot] 2. Check active NN zkfc logs [same time - if you see any issue] 3. Check for standby NN logs at same time if you see any error 4. Check for standby NN zkfc logs for any error at same timestamp 5. Check for Active NN .out file for any warnings/error 6. Check for system logs "/var/log/message" for any issue at particular moment of time. You will find error in one of the above file. accordingly you can go for RCA. Do revert if you need further help.
... View more
11-14-2019
06:03 AM
1 Kudo
Not to take away from the entire conversation above, which in fact was very detailed and specific comparison. The major take away in your pro/con evaluation needs to be Physical Disk compared to Network or some level of shared Disk. Also in a big ha system there are usually more than one disk (not to mean more than one partition). When you go past dev and POC level benchmarking, deep into performance tuning, the Physical Disk, in high availability arrays, with a physical machine will out perform the Cloud or VMs for large volume and large data processes. To get more specific you have to compare all the nuts and bolts as well as evaluate the Performance Best Practices for each platform, service, component, all the way down to application design. This is a great debate and one that I have at every customer. That said I have led prod clusters installs in the cloud: Amazon, Azure, IBM Cloud, Google Cloud, and Private Cloud and VM systems.
... View more
11-12-2019
11:45 PM
I agree with @sagarshimpi , please provide the full log trace from the datanode and see if you are able to find any disk level exception in the logs are not. Also you can open NN UI-->"Datanode volume failures" tab and to confirm this. Leaving safe mode manually will not solve the problem as NN did not receive any block reports from these data nodes, if you will leave the safe mode manually then you will get missing blocks alerts for all the blocks you have (approx 34 million blocks). Did you change any configuration at hdfs level before restarting the whole HDFS service? If yes, then please share the property name. Perform a disk i/o check on all the 4 data nodes using iostat command and check if disks are working fine without any heavy %util.
... View more
11-12-2019
08:37 PM
@VamshiDevraj If you are still facing issue can you share details about the error or screenshot for the same?
... View more
11-12-2019
08:09 PM
1. Is the job failed due to above reason? If "NO" - then is it the error occurring displayed in logs for all spark jobs or just for this job?
... View more
11-11-2019
11:56 PM
Hi Vinay, Do you see any error in logs while running "reassign partition tool" ? This might help to debug issue. Were all the brokers healthy and ISR were good before you ran the tool? ***When I ran this tool, it was stuck with one partition and it hung there for more than a day. The Cluster performance was severely impacted, and we had to restart the entire cluster. >> I can suggest if the data/topics are more then probably you can do reassignment of subset of topics to avoid load on the cluster. You can provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. ***I don't see a way even to stop the tool when its taking long time. >> You can abort the assignment by deleting the "/admin/reassign_partitions" zk node on your zookeeper cluster using zookeeper shell, and move the partitions that are assigned to the dead broker to new nodes. Thanks Sagar S
... View more
09-26-2018
04:21 PM
I tried below process and it worked - Stop AMS Moved contents of AMS "tmp.dir" to backup Moved contents of AMS "root.dir" to backup removed ams znode from zookeeper started AMS AMS is working fine now.
... View more
08-24-2018
01:22 PM
@pjoseph @Nanda Kumar pls share your views
... View more
02-28-2018
10:26 PM
@Sagar Shimpi Thank you for sharing the corrected command.
... View more
- « Previous
- Next »