Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Advice please :-)

Advice please :-)

New Contributor

We have a large integrated database containing a very diverse range of variables. We identify a master population using simple business rules to determine if they meet initial parameters, then keep narrowing down the population using more logic until we know whether they meet the given requirement. This logic is not terribly complex.

After the population is identified we create a large list of transactions associated with these people, then apply further business rules to these transactions to determine if the associated people are ‘in’ or ‘out’ of a final population of interest.

Our current t-sql based solution does not perform well and we're looking for alternatives in the hadoop stack, particularly around using the parallel processing capabilities in hadoop to speed us up and enable larger data problems.

We're looking for suggestions on which of the hadoop or surrounding technologies (even spark, r integrated with hadoop etc etc) would help us with this problem and others like it.

We would really appreciate it :-)

1 REPLY 1

Re: Advice please :-)

New Contributor

you can check for sparkR from amplab https://github.com/amplab-extras/SparkR-pkg  using this user will abel to run Spark distributed application from R client.