- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Tuning Hadoop parameters with Oryx 1.0
Created on ‎05-29-2015 11:25 PM - edited ‎09-16-2022 02:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sean,
Follow up some scenarios I posted before, but post it in a separate thread...
I am using Oryx 1.0 with Hadoop (CDH 5.4.1). It ran slow and I tuned the mapper-memory-mb and reducer-memory-mb..
Not helpful.
Is is possible to tune Oryx config to (1) Tune the number of map and reduce tasks appropriately (2) Use LZO Compression for map
output ?
Thanks.
Created ‎05-30-2015 04:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.
Created ‎05-30-2015 04:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.
Created ‎05-30-2015 10:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK.
Understood that the 3-4 GB data is so called "so small" to see the benefits using Hadoop (due to the overhead).
We are collecting data and it grows fast.
Will see if Hadoop based computation scales fine with much larger data.
Thanks.
Created on ‎06-02-2015 08:22 AM - edited ‎06-08-2015 07:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sean,
Two more questions, as I checked Hadoop logs and Oryx computation logs.. We want to understand how Oryx computation works with Hadoop.
(1) When it computes X or Y (with Hadoop), from the Oryx logs, it indicates for examples, "number of splits:2" and "Total input paths to process : 11"
In the number determined by Hadoop automatically or it's determined by Oryx. I checked Oryx codes and cannot find those.
(2) My question is that if inside Oryx codes, it controls how many reducers to run on each node simultaneously ?
For example, "mapreduce.tasktracker.reduce.tasks.maximum" is overwritten...?
Created ‎06-02-2015 08:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11 is a default number of Reducer tasks which you can change. (For various reasons a prime number is a good choice.) Yes, you will see as many run simultaneously as you have reducer slots. This is determined by MapReduce and defaults to 1 per machine but can be changed if you know the machine can handle many more.
This is all just Hadoop machinery, yeah, not specific to this app.
