About ben.hemphill

ben.hemphill · ‎08-16-2016

Ah I see, then yes I agree, replication does impact fsimage and memory footprint and that documentation should be updated to reflect that.

Satishpa · ‎07-21-2016

Hi, I have one requirement regrading hbase replication. I have two cluster..1 PROD and 1 DR..both having same IP and server name but on diffrent network. And customer wants to assign diffrent IPs for the replication between PROD and DR, So for example at PROD which is my source have one IP which will communiocate with my DR cluster which will have another IP. So PROD and DR will be communicate with new IPs. So in this senario what will be the configuration we need to put and which filles at Source side and target side. Please suggest.

ben.hemphill · ‎05-25-2016

okay looking through the errors, 16/05/25 11:37:48 ERROR orm.CompilationManager: Could not make directory: /home/bigdata/. looks like /home in hdfs may not have the correct permissions to allow users to create their home directory if it doesn't exist already. Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection This is again indicative of a network issue. this time it's complaing about the Network adapter. this is a total guess, but this could be because of tnsnames.ora or listener.ora configuration?

ben.hemphill · ‎05-24-2016

Glad to Hear it!

ben.hemphill · ‎05-10-2016

Short Answer: No, not possible to balance with respect to cpu/memory resources out of the box today to my knowledge. Longer Answer: You can write a custom balancer either external to hbase or using the hbase balancer protocol. Using the region_mover.rb as an example you can write your own jruby that can be run by the shell. Unfortunately, ultimately you will likely be better off without the underpowered nodes in the cluster than you are with them in. Perhaps keep them in for HDFS storage and run just YARN there?

gjaehrling · ‎04-16-2016

So I've fixed it by adjusting the following Yarn settings: yarn.scheduler.maximum-allocation-mb = 8 GiB mapreduce.map.memory.mb = 4 GiB mapreduce.reduce.memory.mb = 4 GiB And I've got the test example as following command running: sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100 Thanks for the comments.

AlinaGHERMAN · ‎04-12-2016

Thank you! Indeed, I recreated all the tables... since I have the trash disabled, I had nothing in trash... However, this is a very complete reply. Thank you!

AlinaGHERMAN · ‎04-12-2016

I'm note sure that I can change all the sources in order to post to all my Flume agents, but this is an interesting solution. Thank you!

ben.hemphill · ‎04-11-2016

doing some quick searching, this blog seems to be doing what I think is your intent, taking logs, storing in kafka, distributing to various consumers, one of those consumers being Cloudera Search (solr) [1] you could make it simpler and store directly to solr[2] if you aren't planning on consuming the same data from multiple sources. instead of logstash you could also use Flume [3] [4] as well. [1]https://www.elastic.co/blog/logstash-kafka-intro [2]https://github.com/lucidworks/solrlogmanager [3]http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_flume_solr_sink.html [4]http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

ben.hemphill · ‎04-11-2016

From the context I'm assuming you have setup a 1 node test cluster? HDFS replicates data between different nodes, the /data/1, /data/2, and /data/3 are just different drives. HDFS will use each of those drives to store blocks, and will replicate those blocks to other nodes in the cluster. by Deleting /data/1 deleted the blocks on that drive. /data/2 or /data/3 won't have those blocks. If you have more than 1 node, HDFS will replicate a copy of the blocks that were stored on /data/1 to one of those other drives, likely spread out among all the available drives on that node. when /data/1 was deleted in that case, HDFS will detect those blocks went missing the next time the datanode checks in and start automatically repairing the under-replicated blocks. Missing blocks implies that the only copy of the block has gone missing, so in that case the only way to recover them would have been to do drive recovery operations on that drive. This will be the case in single node test clusters, thus the assumption above.

Online	Offline
Last Visited	‎02-23-2018 08:00 PM

Member Since	‎01-24-2014 02:39 PM
Last Visited	‎02-23-2018 08:00 PM
Posts	101
Kudos received	32

Cloudera Community

Re: hbase hbck reports inconsistency immediately a...

Re: Reading/Writing data from HDFS via Windows.

Re: user is not allowed to impersonate admin in hb...

Re: Is the Cloudera documentation on Heap calculat...

Re: Hbase region servers load balancing in a clust...

Re: Is the Cloudera documentation on Heap calculat...

Re: HBase Replication between two clusters for dif...

Re: Sqoop import from oracle connection error.

Re: How to drop full record in log file by searchi...

Re: Hbase region servers load balancing in a clust...

Re: Map Reduce Jobs not starting on local CDH 5.7....

Re: HBase repair deleted all my tables beside one

Re: Flume HA

Re: Configure Cloudera CDH 5 for Real time Web log...

Re: how to recover missing blocks of hdfs after de...