About sagarshimpi

sagarshimpi · ‎11-14-2019

@bdelpizzo Can you see any error in kafka logs/mirror maker logs? It might be possible that the mirror maker is not able to process messages, because of size of message. If the size of any message is high than configured/default value then it might stuck in queue. Check for message.max.bytes property

sagarshimpi · ‎11-14-2019

@deekshant To debug Namenode issue you need to check below - 1. Check active namenode[NN] logs [for time when it got reboot] 2. Check active NN zkfc logs [same time - if you see any issue] 3. Check for standby NN logs at same time if you see any error 4. Check for standby NN zkfc logs for any error at same timestamp 5. Check for Active NN .out file for any warnings/error 6. Check for system logs "/var/log/message" for any issue at particular moment of time. You will find error in one of the above file. accordingly you can go for RCA. Do revert if you need further help.

stevenmatison · ‎11-14-2019

Not to take away from the entire conversation above, which in fact was very detailed and specific comparison. The major take away in your pro/con evaluation needs to be Physical Disk compared to Network or some level of shared Disk. Also in a big ha system there are usually more than one disk (not to mean more than one partition). When you go past dev and POC level benchmarking, deep into performance tuning, the Physical Disk, in high availability arrays, with a physical machine will out perform the Cloud or VMs for large volume and large data processes. To get more specific you have to compare all the nuts and bolts as well as evaluate the Performance Best Practices for each platform, service, component, all the way down to application design. This is a great debate and one that I have at every customer. That said I have led prod clusters installs in the cloud: Amazon, Azure, IBM Cloud, Google Cloud, and Private Cloud and VM systems.

npandey · ‎11-12-2019

I agree with @sagarshimpi , please provide the full log trace from the datanode and see if you are able to find any disk level exception in the logs are not. Also you can open NN UI-->"Datanode volume failures" tab and to confirm this. Leaving safe mode manually will not solve the problem as NN did not receive any block reports from these data nodes, if you will leave the safe mode manually then you will get missing blocks alerts for all the blocks you have (approx 34 million blocks). Did you change any configuration at hdfs level before restarting the whole HDFS service? If yes, then please share the property name. Perform a disk i/o check on all the 4 data nodes using iostat command and check if disks are working fine without any heavy %util.

sagarshimpi · ‎11-12-2019

@VamshiDevraj If you are still facing issue can you share details about the error or screenshot for the same?

sagarshimpi · ‎11-12-2019

1. Is the job failed due to above reason? If "NO" - then is it the error occurring displayed in logs for all spark jobs or just for this job?

sagarshimpi · ‎11-11-2019

Hi Vinay, Do you see any error in logs while running "reassign partition tool" ? This might help to debug issue. Were all the brokers healthy and ISR were good before you ran the tool? ***When I ran this tool, it was stuck with one partition and it hung there for more than a day. The Cluster performance was severely impacted, and we had to restart the entire cluster. >> I can suggest if the data/topics are more then probably you can do reassignment of subset of topics to avoid load on the cluster. You can provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. ***I don't see a way even to stop the tool when its taking long time. >> You can abort the assignment by deleting the "/admin/reassign_partitions" zk node on your zookeeper cluster using zookeeper shell, and move the partitions that are assigned to the dead broker to new nodes. Thanks Sagar S

sagarshimpi · ‎09-26-2018

I tried below process and it worked - Stop AMS Moved contents of AMS "tmp.dir" to backup Moved contents of AMS "root.dir" to backup removed ams znode from zookeeper started AMS AMS is working fine now.

sagarshimpi · ‎08-24-2018

@pjoseph @Nanda Kumar pls share your views

jsensharma · ‎02-28-2018

@Sagar Shimpi Thank you for sharing the corrected command.

Online	Offline
Last Visited	‎11-21-2025 08:23 AM

Member Since	‎02-18-2016 01:33 AM
Last Visited	‎11-21-2025 08:23 AM
Posts	141
Kudos received	19

Cloudera Community

Re: Using yarn logs command

Re: Using yarn logs command

Re: Data replication in datanode new

Re: Data replication in datanode new

Re: Mysql JDBC connection error for ambari install...

Re: When there are a lot message the mirror maker ...

Re: Namenode shutdown unexpectedly.

Re: Running a Cluster on Physical Servers V.S. on ...

Re: HDFS gets 0 datanodes live but datanodes are s...

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: yarn logs + blk_xxxxxx_xxxxxx does not exist o...

Re: Apache Kafka partition reassignment reg.

Re: Ranger Audits are not displayed while using so...

Re: Nodemanager process crashed due to 'GC overhea...

Re: How to modify knox "Advance Topology" using Am...