Member since
02-01-2017
3
Posts
0
Kudos Received
0
Solutions
02-09-2017
03:45 AM
Thanks Josh! Yep, I did read that excellent blog post. I was mostly interested to hear if anyone else has been in a similar situation, and how they resolved it. We'll proceed as best we can and I'll report back here in the future to let everyone know how it goes.
... View more
02-01-2017
11:54 AM
We have an application that is backed by a 'small' database (~1TB of data in Oracle SE). The database is growing slowly at the moment (about 1GB per month), however that will increase in the future. A lot of this data is historical / cold. We are considering using a product called Gluent to help us offload data to a Hadoop cluster, however Hadoop is completely new to us and initially it seems like overkill. That said, I can see many advantages for having a Hadoop cluster as a 'data lake' for both the database data and various data related to the application that is not stored in the database at the moment (e.g. data files that currently reside on the application server). Based on my (very limited) understanding of Gluent, for our usage the majority of the processing would still be in Oracle, with only the occasional queries to the 'cold' data in Hadoop, so responsiveness is not a very high priority. i.e. I believe we might be able to get away with 'low end' specs. My question is: what are the minimum hardware specs for a small cluster to fit our scenario? I'm sure there's an element of "it depends" in the answer, but I guess I want to verify that it is feasible to start a 'production' cluster with minimal resources (e.g. balanced nodes w/ ~4-6 CPU, ~16-32GB RAM, and ~500GB - 1TB disks), with the ability to scale up in the years to come.
... View more
Labels:
- Labels:
-
Apache Impala
-
HDFS
02-01-2017
08:48 AM
Ed, what did you choose in the end? Similar position here. Don't have BIG data yet (only a couple of TB), but planning for future. Thinking of using impala on top.
... View more