Member since
02-03-2016
20
Posts
18
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1600 | 05-10-2016 10:46 AM |
05-02-2016
05:15 PM
We have stopped the services but getting a warning for the memory validation threashhold. How to rectify that?
... View more
05-02-2016
01:48 PM
Hi, We are trying to do a PoC of an idea we have. We are trying to leverage the cloudera single node cluster hosted on virtual box. We need spark 1.6, HBase 1.2 and hive to run our PoC. Where can I find the instructions on how to shutdown all but the required service via Cloudera Manager. We are running into memory issues with so many services running. Any pointers/help would be appreciated. Thanks, Roy
... View more
03-19-2016
06:13 AM
2 Kudos
Hi Lester, Thanks for sharing your presentation. Our need is little more stringent where an SLA of maintain the mainframe system and hadoop in sync would be a challenge. I am thinking to using a CDC product for going after the DB2 logs on mainframe and then generate a message to update HBase. What do you think? Thanks, Roy
... View more
03-10-2016
07:33 AM
3 Kudos
What is the east way to update data in MDM and hadoop. Is hive 0.14 with update support or updating data frame via spark is the recommended options.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
03-09-2016
08:03 PM
3 Kudos
I have data coming in small files from a set of 5000 data providers. Since the data is coming from external providers the files have no id field. So I need to read each line and search HBase to find the id. At the moments I am avoiding/ignoring the complexity of creating a new id if non found by search. Since the data coming have different no of columns and formats I have to create a common format for storing all the data in a HBase table. Now my question is there a tool or trick that can help me efficiently do the mapping of fields from these 5000 data formats to a common format. Also how will I manage when the data format is modified by the data provider. If anybody has implemented such a system or has recommendations I would be glad to hear.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
02-05-2016
11:41 PM
2 Kudos
We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?
... View more
Labels:
- Labels:
-
Apache HBase
02-04-2016
06:58 PM
1 Kudo
Would there be a negative impact if the slat was say 5 digits? How does increasing the split file size from 10GB to 40GB or more say 100G affect performance? If you have 12 disks(4TB each) per node in a 40 node cluster and you want to store 400TB would you say (400*1024)/(40*10) ~ 1024 regions with 10GB file would be better or 103 regions with 100GB files.
... View more
02-03-2016
10:58 PM
What about NAS storage mounted ?
... View more
02-03-2016
10:24 PM
We currently do not have a DR site/WAN Disco. Is there any other alternatives?
... View more
02-03-2016
10:13 PM
1 Kudo
We are constantly running out of space on the hadoop nodes. Is it recommended to enable logging of hadoop logs to hdfs mounted as nfs on the data nodes? Or is it better to mount a nap drive to the nodes for storing log files. Is there any challenges?
... View more
Labels:
- Labels:
-
Apache Hadoop
02-03-2016
09:35 PM
1 Kudo
We have a process which pulls messages from MQ and puts it in HBase. Since the messages have a 10 sec expiry we cannot afford to have the cluster down. What do people do in such situations? We need enable namenode HA on the hortonworks cluster without taking the cluster offline.
... View more
02-03-2016
09:27 PM
2 Kudos
Our table has one column family which has a TTL of 15day. So rows fall at a consistent basis. We are seeing that the no of regions are going up. Some how the regions are not getting re-used. We are currently with over 41k regions of which 19k are empty. Any pointers as to why regions are not getting reused. Our row key design is similar to what you mentioned. 2 digit salt (0 to 63) followed by hashcode of reverse time stamp in seconds, followed by a service id and finally a counter.
Our cluster is 41 nodes and we are writing rows at a rate of 4k to 6k Tps. The average row size is about 35kb.
... View more
Labels:
02-03-2016
07:48 PM
1 Kudo
We have a process which pulls messages from MQ and puts it in HBase. Since the messages have a 10 sec expiry we cannot afford to have the cluster down. What do people do in such situations?
... View more
02-03-2016
07:21 PM
2 Kudos
Currently we have no HA on the name node in a HDP 2.2 cluster in production. We are trying to determine if we can add a new node and enable it has a standby namenode. Our objective is to enable namenode HA on the hortonworks cluster without taking the cluster offline.
... View more
Labels:
- Labels:
-
Apache Hadoop