Created 03-10-2016 09:26 PM
Created 03-10-2016 09:28 PM
Yes.. I will write an article on this but for now you control this using the following parameter:
hive.exec.reducers.bytes.per.reducer
If you decrease it, you get more reducers.
If you increase it, you get less. By default it is 1 GB.
Try starting from 256MB and see the amount of reducers created.
Link to article
Created 03-11-2016 09:24 AM
In addition to what Ancil wrote you can also simply set the number of reducers as well:
set mapred.reduce.tasks=xxx;
The Hive guys don't like it too much because it can obviously result in bad performance if you don't know what you are doing. But in edge cases ... I use it for example to determine the number of ORC files during big loads.
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data