Member since
02-03-2016
20
Posts
18
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2623 | 05-10-2016 10:46 AM |
05-02-2016
05:15 PM
We have stopped the services but getting a warning for the memory validation threashhold. How to rectify that?
... View more
05-02-2016
01:48 PM
Hi, We are trying to do a PoC of an idea we have. We are trying to leverage the cloudera single node cluster hosted on virtual box. We need spark 1.6, HBase 1.2 and hive to run our PoC. Where can I find the instructions on how to shutdown all but the required service via Cloudera Manager. We are running into memory issues with so many services running. Any pointers/help would be appreciated. Thanks, Roy
... View more
Labels:
03-19-2016
06:13 AM
2 Kudos
Hi Lester, Thanks for sharing your presentation. Our need is little more stringent where an SLA of maintain the mainframe system and hadoop in sync would be a challenge. I am thinking to using a CDC product for going after the DB2 logs on mainframe and then generate a message to update HBase. What do you think? Thanks, Roy
... View more
03-10-2016
07:33 AM
3 Kudos
What is the east way to update data in MDM and hadoop. Is hive 0.14 with update support or updating data frame via spark is the recommended options.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
03-09-2016
08:03 PM
3 Kudos
I have data coming in small files from a set of 5000 data providers. Since the data is coming from external providers the files have no id field. So I need to read each line and search HBase to find the id. At the moments I am avoiding/ignoring the complexity of creating a new id if non found by search. Since the data coming have different no of columns and formats I have to create a common format for storing all the data in a HBase table. Now my question is there a tool or trick that can help me efficiently do the mapping of fields from these 5000 data formats to a common format. Also how will I manage when the data format is modified by the data provider. If anybody has implemented such a system or has recommendations I would be glad to hear.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
02-05-2016
11:41 PM
2 Kudos
We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?
... View more
Labels:
- Labels:
-
Apache HBase
02-04-2016
06:58 PM
1 Kudo
Would there be a negative impact if the slat was say 5 digits? How does increasing the split file size from 10GB to 40GB or more say 100G affect performance? If you have 12 disks(4TB each) per node in a 40 node cluster and you want to store 400TB would you say (400*1024)/(40*10) ~ 1024 regions with 10GB file would be better or 103 regions with 100GB files.
... View more
02-03-2016
10:58 PM
What about NAS storage mounted ?
... View more
02-03-2016
10:24 PM
We currently do not have a DR site/WAN Disco. Is there any other alternatives?
... View more