Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this solution

avatar
Explorer

So, I managed to fix my problem. The first hint was the GC overhead limit exceeded message. I quickly found out that this can be cause by lack of heapspace for the JVM. After digging a bit into the YARN configuration in Cloudera Manager, and comparing it to the setting in an Amazon Elastic Mapreduce cluster (where my Pig scripts did work), I found out that, even though each node had 30GB of memory, most YARN components had very low heapspace settings.

 

I updated the heapspace for the NodeManagers, ResourceManager and Containers and I also set the max heapspace for mappers and reducers somewhat higher, keeping in mind the total amount of memory available on each node (and the other services running there, like Impala) and now my Pig scripts work again!

 

Two issues I want to mention in case a Cloudera engineer reads this:

  • I find it a bit strange that Cloudera Manager doesn't set saner heapspace amounts, based on the total amount of RAM available
  • The fact that not everything runs under YARN yet, makes it harder to manage memory. You actually have to manage memory manually. If Impala would run under YARN, there would be less memory management I think 🙂

View solution in original post

Who agreed with this solution