Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Tuning Hadoop parameters with Oryx 1.0

Solved Go to solution

Tuning Hadoop parameters with Oryx 1.0

Explorer

Sean,

 

Follow up some scenarios I posted before, but post it in a separate thread...

I am using Oryx 1.0 with Hadoop (CDH 5.4.1). It ran slow and I tuned the mapper-memory-mb and reducer-memory-mb..

Not helpful.

Is is possible to tune Oryx config to (1) Tune the number of map and reduce tasks appropriately (2) Use LZO Compression for map

output ?

 

Thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Tuning Hadoop parameters with Oryx 1.0

Master Collaborator

As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.

4 REPLIES 4

Re: Tuning Hadoop parameters with Oryx 1.0

Master Collaborator

As I say, I don't think memory helps unless you are memory bound. It does not increase performance. You should let hadoop choose the number of mappers in general. I think it would be more helpful to know anything about your data and problem in order to recommend where to look. It sounds like your data is so small that this is all Hadoop overhead, and 'tuning' doesn't help in that it does not reflect how a large data set would behave.

Re: Tuning Hadoop parameters with Oryx 1.0

Explorer

OK.

 

Understood that the 3-4 GB data is so called "so small" to see the benefits using Hadoop (due to the overhead).

We are collecting data and it grows fast.

Will see if Hadoop based computation scales fine with much larger data.

 

Thanks.

Highlighted

Re: Tuning Hadoop parameters with Oryx 1.0

Explorer

Sean,

 

Two more questions, as I checked Hadoop logs and Oryx computation logs.. We want to understand how Oryx computation works with Hadoop.

 

(1) When it computes X or Y (with Hadoop), from the Oryx logs, it indicates for examples, "number of splits:2" and "Total input paths to process : 11"

In the number determined by Hadoop automatically or it's determined by Oryx. I checked Oryx codes and cannot find those.

 

(2) My question is that if inside Oryx codes, it controls how many reducers to run on each node simultaneously ?

For example, "mapreduce.tasktracker.reduce.tasks.maximum" is overwritten...?

 

 

 

 

 

Re: Tuning Hadoop parameters with Oryx 1.0

Master Collaborator
Yes, the number of splits and therefore Mapper tasks is determined by Hadoop MapReduce and this is not altered or overridden.

11 is a default number of Reducer tasks which you can change. (For various reasons a prime number is a good choice.) Yes, you will see as many run simultaneously as you have reducer slots. This is determined by MapReduce and defaults to 1 per machine but can be changed if you know the machine can handle many more.

This is all just Hadoop machinery, yeah, not specific to this app.
Don't have an account?
Coming from Hortonworks? Activate your account here