About sagarshimpi

sagarshimpi · ‎11-15-2019

@mike_bronson7 you just need to backup /hadoop/hdfs/namenode/current from active namenode Also if you backup one week earlier the activity and lets say your first cluster is going serve more request to clients then you will loose that data which was written after backup. So best is to do savenamespace and backup when you are going to do activity and freeze clients not accessing the cluster.

sagarshimpi · ‎11-15-2019

Backup i mean, copy the namenode current directory only first do safemode on and then save namespace. once both commands are executed take backup of namenode current directory from active node. you can copy to destination/new cluster using any command (like scp) or tool. scp sill be simplest option.

sagarshimpi · ‎11-15-2019

1. if you can backup metadata drom orignal cluster(where datanode were existing at first) and copy that metadata to new cluster then thats the best option. 2. if you are not able to go with point 1, then probably you can try " hadoop namenode -recover" option. below link might be useful https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/ https://clouderatemp.wpengine.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

sagarshimpi · ‎11-15-2019

@mike_bronson7 What i got from your scenario is on second scratch installation your master nodes [ie. active/standby name-node ] are fresh installed and you are only adding the datanodes which are having pre-existing data [from other cluster]..right!! -- In this case its not possible to get the cluster up with new data from the HDD which was restored. Since namenode will not have any information about the blocks lying in blockstorage on the datanode disk. If you have opted support from Cloudera then you can approach them for DR scenario where they can help you to get existing data from datanodes to be added back in cluster[not sure if it can be recovered/added back 100%] Same for kafka.

sagarshimpi · ‎11-14-2019

@TheBroMeister I will try to comment my views inline - 1.) How different would the Setup and configuration be for Physical Servers as to VMs. Yes, Setting up the VMs would be faster as compared to the physical ones but are there any additional configurations or settings that we would need to look into? -- If we talk regarding general configuration they below points will be taken in account which counting on performance - a. Disks b Network c. Memory/CPU d. SLA 2.) We've read that one possible issue with setting the cluster on VMs is with Data Locality and redundancy. On how no 2 replicas should not be in the same physical node but since one physical node may house several VMs, would there be a way around this issue? -- VM with external storage[like SAN] will be impacting data locality. You can go with dedicated disk for the VM's which will be a good hybrid approach. 'YES' , also for data locality addon components from virtual vendors[like vmware] are provided - such as BDE [Big Data Extensions] also for Network compromises of NSX technology which will help to speed up systems to avoid performance impacts. But you need to take licensing cost into account. 3.) Since the specs of the VMs would be restricted to the specs of the physical node and its resources be split depending on how many VMs it is housing, wouldn't it be better to have separate servers to house 1 node of a cluster to get better performance? and would having several VMs in one physical node affect the parallelism of the jobs that will run on the cluster? -- Its difficult to put decision at first moment based upon actual experiences. This decision purely depends upon your sla. At start while running hadoop applications, you might not be aware of how much time it takes for your application to process or meet the SLA. This can be purely POC base approach you need to test and also run benchmarking before you go for actual dev/uat/prod implementations. benchmarking results will give you fair idea about performance and computational stats. That can be easy then to take the decision. Pls do check below links which might be useful - https://community.cloudera.com/t5/Support-Questions/Virtual-Machines-in-Hadoop-cluster/td-p/119675 https://www.kdnuggets.com/2015/12/myths-virtualizing-hadoop-vsphere-explained.html https://pubs.vmware.com/bde-2/index.jsp

sagarshimpi · ‎11-13-2019

@TheBroMeister Every technology has its pros and cons. The above comment is very broad and every lasting if we discuss. Do you have any specific question/issue regarding implementations/architecture ? Will try to comment accordingly.

sagarshimpi · ‎11-12-2019

@VamshiDevraj If you are still facing issue can you share details about the error or screenshot for the same?

sagarshimpi · ‎11-12-2019

Can you also check heap size utilization for Ambari server. You might need to revisit Ambari server heap config if needed. Check this link for details - https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/administering-ambari/content/amb_adjust_ambari_server_heap_size.html

sagarshimpi · ‎08-23-2018

Nice and very useful Article @Rajkumar Singh ..

sagarshimpi · ‎02-23-2018

@Kuldeep Kulkarni Add "deploy JCE policies" steps as prerequisites. I tried without JCE and it fails for me. Let me know if i am missing anything.

Online	Offline
Last Visited	‎11-21-2025 08:23 AM

Member Since	‎02-18-2016 01:33 AM
Last Visited	‎11-21-2025 08:23 AM
Posts	141
Kudos received	19

Cloudera Community

Re: Using yarn logs command

Re: Using yarn logs command

Re: Data replication in datanode new

Re: Data replication in datanode new

Re: Mysql JDBC connection error for ambari install...

Re: how to recover HDP cluster by installing HDP f...

Re: how to recover HDP cluster by installing HDP f...

Re: how to recover HDP cluster by installing HDP f...

Re: how to recover HDP cluster by installing HDP f...

Re: Running a Cluster on Physical Servers V.S. on ...

Re: Running a Cluster on Physical Servers V.S. on ...

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: Ambari UI is responding very Slow

Re: Understanding Kafka Consumer partition assignm...

Re: Automated Kerberos Installation and Configurat...