Created on 05-29-2015 11:25 PM - edited 09-16-2022 02:30 AM
Sean,
Follow up some scenarios I posted before, but post it in a separate thread...
I am using Oryx 1.0 with Hadoop (CDH 5.4.1). It ran slow and I tuned the mapper-memory-mb and reducer-memory-mb..
Not helpful.
Is is possible to tune Oryx config to (1) Tune the number of map and reduce tasks appropriately (2) Use LZO Compression for map
output ?
Thanks.
Created 05-30-2015 04:13 AM
As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.
Created 05-30-2015 04:13 AM
As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.
Created 05-30-2015 10:13 AM
OK.
Understood that the 3-4 GB data is so called "so small" to see the benefits using Hadoop (due to the overhead).
We are collecting data and it grows fast.
Will see if Hadoop based computation scales fine with much larger data.
Thanks.
Created on 06-02-2015 08:22 AM - edited 06-08-2015 07:17 PM
Sean,
Two more questions, as I checked Hadoop logs and Oryx computation logs.. We want to understand how Oryx computation works with Hadoop.
(1) When it computes X or Y (with Hadoop), from the Oryx logs, it indicates for examples, "number of splits:2" and "Total input paths to process : 11"
In the number determined by Hadoop automatically or it's determined by Oryx. I checked Oryx codes and cannot find those.
(2) My question is that if inside Oryx codes, it controls how many reducers to run on each node simultaneously ?
For example, "mapreduce.tasktracker.reduce.tasks.maximum" is overwritten...?
Created 06-02-2015 08:34 AM