We are using dynamic partition on 250GB table. The issue is query is working on small table but is not working on huge data set like more then 100 GB . We are following the Apache wiki and rewrite the query using " distributed by " but still didnt help . The query is failing in the reducer phase . Any Guidence or idea guys .....
Thanks for posting. I can't say much without knowing the actual errors but it seems like you may need to adjust some properties related to dynamic partitioning.
I would strongly suggest reading https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-DynamicpartitionInsert
The three properties you probably want to tune are:
They defaults are kept intentionally low to conservatively present JVM heap size errors.
Also, I am assuing you are already setting the mode correctly by doing something like: