Created 08-02-2017 03:44 AM
Hello,
Created 08-02-2017 06:16 AM
One way which I could think of is increasing the splits of the file. Also check whether the size of mapper/reducer is equal to the size of the HDFS block size. So that More no of mapred jobs can run in parallel on different blocks. Check whether the data are distributed equally. If you can compress the source file then try LZO compression on the source files so that the no of I\O bound will be reduced which is turn decides the mapred jobs.These are at all high level check which you can perform.
Do check this for more informations
https://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
Created 08-02-2017 10:10 AM
Already, number of splits is 192. Should i still increase number of splits above 192? My each split is of not equal size. This is because length of each line in my data set is not fixed. I used nlinespermap property to make every map to get same number of lines for processing. But as the length of every line is not fixed, the split size is not same among all mappers.
Is it good or bad that utilizing all the cores available in the clusters for map tasks in this situation? How about using Techyon for this problem. will i see any performance improvement if i use tachyon? Thanks in advance for your replies.