About SRoy

SRoy · ‎05-10-2016

I ended up increasing the memory on my iMac to 32GB.

SRoy · ‎05-02-2016

We have stopped the services but getting a warning for the memory validation threashhold. How to rectify that?

SRoy · ‎05-02-2016

Hi, We are trying to do a PoC of an idea we have. We are trying to leverage the cloudera single node cluster hosted on virtual box. We need spark 1.6, HBase 1.2 and hive to run our PoC. Where can I find the instructions on how to shutdown all but the required service via Cloudera Manager. We are running into memory issues with so many services running. Any pointers/help would be appreciated. Thanks, Roy

SRoy · ‎03-19-2016

Hi Lester, Thanks for sharing your presentation. Our need is little more stringent where an SLA of maintain the mainframe system and hadoop in sync would be a challenge. I am thinking to using a CDC product for going after the DB2 logs on mainframe and then generate a message to update HBase. What do you think? Thanks, Roy

SRoy · ‎03-10-2016

What is the east way to update data in MDM and hadoop. Is hive 0.14 with update support or updating data frame via spark is the recommended options.

SRoy · ‎03-09-2016

I have data coming in small files from a set of 5000 data providers. Since the data is coming from external providers the files have no id field. So I need to read each line and search HBase to find the id. At the moments I am avoiding/ignoring the complexity of creating a new id if non found by search. Since the data coming have different no of columns and formats I have to create a common format for storing all the data in a HBase table. Now my question is there a tool or trick that can help me efficiently do the mapping of fields from these 5000 data formats to a common format. Also how will I manage when the data format is modified by the data provider. If anybody has implemented such a system or has recommendations I would be glad to hear.

SRoy · ‎02-05-2016

We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?

SRoy · ‎02-04-2016

Would there be a negative impact if the slat was say 5 digits? How does increasing the split file size from 10GB to 40GB or more say 100G affect performance? If you have 12 disks(4TB each) per node in a 40 node cluster and you want to store 400TB would you say (400*1024)/(40*10) ~ 1024 regions with 10GB file would be better or 103 regions with 100GB files.

SRoy · ‎02-03-2016

What about NAS storage mounted ?

SRoy · ‎02-03-2016

We currently do not have a DR site/WAN Disco. Is there any other alternatives?

Online	Offline
Last Visited	‎05-12-2016 01:14 AM

Member Since	‎02-03-2016 11:07 AM
Last Visited	‎05-12-2016 01:14 AM
Posts	20
Kudos received	18

Cloudera Community

Re: Effectively using single node cluster

Re: Effectively using single node cluster

Re: Effectively using single node cluster

Effectively using single node cluster

Re: Best way to update data in MDM on hadoop

Best way to update data in MDM on hadoop

How to efficiently map data columns from external ...

Best way to achieve custom retention of some rows ...

Re: HBase creating empty regions

Re: Enable logging of hadoop logs to hdfs mounted ...

Re: How to enable name node HA without cluster dow...