Member since
09-02-2019
13
Posts
1
Kudos Received
0
Solutions
11-14-2019
01:17 AM
Hi @sagarshimpi, Thanks, this will shed some light to our discussion. I was wondering that if we have some follow-up questions, I can just tag you here in the thread if that's alright with you.
... View more
11-13-2019
11:37 PM
Hi @sagarshimpi , Right now the team is more inclined to doing it in Virtual Machines since the Hyper converge servers are already set up, as to buying and setting up new physical servers. As of the moment, I do not have the specs of the HCI servers that has been set up. As of the moment the big questions we asking is: 1.) How different would the Setup and configuration be for Physical Servers as to VMs. Yes, Setting up the VMs would be faster as compared to the physical ones but are there any additional configurations or settings that we would need to look into? 2.) We've read that one possible issue with setting the cluster on VMs is with Data Locality and redundancy. On how no 2 replicas should not be in the same physical node but since one physical node may house several VMs, would there be a way around this issue? 3.) Since the specs of the VMs would be restricted to the specs of the physical node and its resources be split depending on how many VMs it is housing, wouldn't it be better to have separate servers to house 1 node of a cluster to get better performance? and would having several VMs in one physical node affect the parallelism of the jobs that will run on the cluster? I am unfamiliar with Hyper converge infrastructure and how it will affect the functionality and performance of VMs as compared traditional server architecture. Also based on some blogs I've read, they say that VM clusters are good for development since they are more flexible(easy to create and destroy) but in production sense it would be better to have it in physical servers. Thanks.
... View more
11-13-2019
10:19 PM
Hi @npandey I currently have no access to the cluster as of the moment, but I will get back to you once I do.
... View more
11-13-2019
10:18 PM
Me and My colleagues are having a discussion regarding the pros and cons on running a Cloudera cluster on Physical Servers versus running the cluster on several Virtual Machines on a Hyper Converge servers.
... View more
Labels:
- Labels:
-
Apache Hadoop
10-24-2019
09:58 PM
I have a 4 node CDH cluster and want to add CDSW. Does adding CDSW entail adding a new host onto the cluster or is CDSW ran on top of the CDH cluster?
... View more
Labels:
10-24-2019
06:45 PM
Hi Ganesh, I've ran the HDFS Diskbalancer and the results did not turn out as I expected. There is still a disparity between the disk in some of the data nodes. It did however lessen some of the load as compared to the previous check but I am not sure if this is the best that the diskbalancer could plan or the plan did not run properly. After Diskbalancer Host 2: /dev/sdb1 135G 94G 35G 74% /cmdisk/sdb /dev/sdc1 135G 65G 64G 51% /cmdisk/sdc /dev/sdd1 135G 65G 64G 51% /cmdisk/sdd /dev/sde1 135G 69G 60G 54% /cmdisk/sde /dev/sdf1 135G 69G 59G 54% /cmdisk/sdf /dev/sdg1 135G 67G 62G 52% /cmdisk/sdg /dev/sdh1 135G 65G 64G 51% /cmdisk/sdh Host 3: /dev/sdb1 135G 111G 17G 87% /cmdisk/sdb /dev/sdc1 135G 59G 70G 46% /cmdisk/sdc /dev/sdd1 135G 58G 71G 45% /cmdisk/sdd /dev/sde1 135G 59G 69G 47% /cmdisk/sde /dev/sdf1 135G 59G 70G 46% /cmdisk/sdf /dev/sdg1 135G 65G 64G 51% /cmdisk/sdg /dev/sdh1 135G 59G 70G 46% /cmdisk/sdh Host 4: /dev/sdb1 135G 110G 18G 86% /cmdisk/sdb /dev/sdc1 135G 58G 71G 45% /cmdisk/sdc /dev/sdd1 135G 59G 70G 46% /cmdisk/sdd /dev/sde1 135G 64G 64G 50% /cmdisk/sde /dev/sdf1 135G 60G 69G 47% /cmdisk/sdf /dev/sdg1 135G 59G 70G 46% /cmdisk/sdg /dev/sdh1 135G 59G 70G 46% /cmdisk/sdh Thanks again for helping, it is much appreciated. Regards, Jan
... View more
10-24-2019
06:40 PM
Hi @npandey I've checked the "hdfs dfsadmin -report" and all the Non-DFS storage used in all of my data nodes are 0. As for the DFS Used% they are not all the same. Host 1 has 47%, Host 2 has 56%, Host 3 has 54%, and Host 4 has 53%. After running the Disk balancer, i checked the disk usage on the mount points of all my hosts and they still seem to not be as balanced as I would hope. $ du -h results: Host 1 55G ./sdb 60G ./sdc 61G ./sdd 60G ./sde 63G ./sdf 59G ./sdg 64G ./sdh 418G . Host 2 94G ./sdb 65G ./sdc 65G ./sdd 68G ./sde 69G ./sdf 67G ./sdg 65G ./sdh 489G . Host 3 111G ./sdb 59G ./sdc 58G ./sdd 59G ./sde 59G ./sdf 65G ./sdg 59G ./sdh 467G . Host 4 110G ./sdb 58G ./sdc 59G ./sdd 64G ./sde 60G ./sdf 59G ./sdg 59G ./sdh 465G . I am unsure if this is an acceptable level for the balance and how else to proceed. Regards, Jan
... View more
10-24-2019
12:09 AM
Thanks Ganesh, Will let you know if this will resolve my issue when I run it later when no one is using the cluster. Regards, Jan
... View more
10-23-2019
10:29 PM
Hi Ganesh, Thanks for replying, I have a follow-up question. Will running this command while the cluster is active (e.g. spark is running and some hive queries are being run) affect them in anyway? or will the plan run in the background so no processes will be affected? Thanks, Jan
... View more