Member since
11-24-2015
223
Posts
10
Kudos Received
0
Solutions
02-28-2017
02:08 PM
If we are going to decomm a data node on which a region server is also running : 1. first - is that a viable option? Can a data node be decommissioned when a region server is also running on that host? Doesn't the region server have/use data on that data node - if so, then how can the data node be decommissioned? 2. let us suppose to due some severed hardware error the node has to be decommed. Then is it necessary to shift the region server on to another node? Or (if there is no real issue with the hardware etc) can a region server exist on a machine without a data node on it? 3. Is there anything to be done to HBASE/Region Server (maintenance mode etc) when you decommission a data node? Appreciate the insights.
... View more
Labels:
- Labels:
-
Apache HBase
02-14-2017
04:12 PM
Another related question is if cluster replication is enabled for HBASE/HIVE for HA, is HDFS replication still required? In such cases, isn't default replication factor of 3 a overkill? Is it possible to reduce HDFS replication factor to 2 (one copy) in such cases?
Any insights on what the standard practice across the industry is? Appreciate the feedback.
... View more
02-14-2017
03:40 PM
Are 000000_0_copy_1, 000000_0_copy_2, 000000_0_copy_3 the hdfs replication copies of 000000_0 ? Or are they independent tables that you had created? Appreciate the feedback.
... View more
02-14-2017
02:50 PM
1 Kudo
My question is NOT about HIVE/HBASE replication across clusters. But rather about whether HIVE and HBASE since they sit on top of HDFS, will the default HDFS replication factor affect HIVE and HBASE data. So within a single cluster, on a HIVE or HBASE setup, are there three copies (default replication factor) of each HIVE/HBASE table sitting across the HDFS? Appreciate the insights.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
02-07-2017
03:35 PM
But distcp is only a point in time copy right? You need to run the command every time you want a copy to be done. It is not automatic so that any change in data in an existing file or a new file copy is automatically going to be copied across. So not sure why you refer to it as replication, which normally means that any change in data will be reflected at the replicate site automatically. Appreciate the clarification.
... View more
02-07-2017
03:20 PM
One related question here is also the time it would take to copy a large file into hdfs. So if that file is lost/corrupt in hdfs, other than snapshots is there any other efficient/quick way to get the data loaded into hdfs (provided we have an OS copy of that file ofcourse)?
... View more
02-07-2017
03:15 PM
I read that Hive can be backed up using Snapshots in an incremental way. That is take a snapshot of hive data at one point in time and from then on (to another point in time) take snapshots and use the difference feature to get the incrementals between the current snapshot and the previous one. So data can be recovered by loading the full Snapshot and the incrementals to a point in time (like RDBMS recovery). Is this workable? Appreciate the insights.
... View more
02-07-2017
03:09 PM
Thanks for the insights on snapshots. Can I know where you use snapshots - what files/directories do you keep copies of? >the mechanism for backups and disaster recovery is replication (both for Hive and HDFS as well as for HBase). So by replication do you mean distcp copy across two clusters - but distcp copies are not real time - so can they be called as replication? I have read that Flume can be used to copy from a source to two different clusters - but even such a configuration would exist outside hdfs. Is this what you meant by replication? But I guess by 'real time' I am talking about changes in data in HDFS which is not really a design feature of Hadoop. Appreciate the feedback.
... View more
02-06-2017
06:17 PM
Can I have a response on what is required for the edge node to connect to the cluster pls? Appreciate the feedback.
... View more