Member since
02-08-2018
3
Posts
1
Kudos Received
0
Solutions
02-16-2018
07:31 PM
Hi, Umar I would recommend going through HDP Sandbox, which covers the installation of Ambari + Hortonworks Data Platform as well as several tutorials and guides to learn Hadoop administration mostly from the scratch (although some good Linux skills are needed, of course): https://hortonworks.com/products/sandbox/ With regards to books, a few colleagues have recommended me the book Expert Hadoop Administration, by Sam R. Alapati: https://www.bookdepository.com/Expert-Hadoop-Administration-Sam-R-Alapati/9780134597195 Regarding videos, here's a few good ones: Interesting TED Talks on what is BigData (mostly informational but interesting enough): https://www.youtube.com/watch?v=8pHzROP1D-w https://www.youtube.com/watch?v=0Q3sRSUYmys Overview on what is Hadoop (by Hortonworks): https://youtu.be/6UtD53BzDNk Overview on Hadoop Core components (by Hortonworks): https://youtu.be/a0hY3pyWQ4U Overview on Hive (by Hortonworks): https://youtu.be/Pn7Sp2-hUXE Series of videos about the Hadoop Ecosystem and a detailed view of its components (by Cloudera): https://www.cloudera.com/more/training/library/hadoop-essentials.html Hope it helps! Cheers, Juan
... View more
02-14-2018
03:01 PM
Hi, Lester. Thanks for your response. I didn't know about HBase's snapshot feature. I'll dig into it. Regarding distcp, I was also thinking about using it, although I'm not sure how much time it will take to copy all the data, but I'll definitely check on it as well. Best regards, Juan
... View more
02-08-2018
03:36 PM
1 Kudo
Hi, Folks. We have a Hadoop cluster with ~1.5 PB of data (i.e. ~1500 TB), running on bare metal with CDH 5.7 and without Cloudera Manager. We're planning to decommission the cluster and set up a new one from the scratch (bare metal as well, not cloud), problably switching to Hortonworks (HDP) this time. We're also moving the whole datacenter where it's currently located, so the new one will be on a different location. The idea is to keep all the data (all 1.5 PB of data is relevant, so unfortunately we can't get rid of anything). Just to clarify, we're talking about HDFS data as well as HBase databases/tables. That being said, my question is: Assuming we have our brand-new cluster set up and ready to ingest the data, what would be the best method to migrate all 1.5 PB of it to the new one? Needless to say we need to have the least possible downtime while doing all this. Below is our current cluster's resources: 2 NameNodes in HA --> 2.80GHz 6-core / 24GB RAM 49 DataNodes: 5 of them --> 2.4GHz 6-cores / 72GB RAM 38 of them --> 2.3GHz 16-cores / 128GB RAM 6 of them --> 2.4GHz 32-cores / 128GB RAM Thanks in advance!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase