We are working on some proof of concepts on the Hadoop dev environments and are running into perceived performance and memory issues using Hive and SQL. Is there a way we can run something like "explains" on the SQL or assess the environment. Need to determine where the bottlenecks might be. It takes about about 20 minutes to do an average calculation in SQL for about 26 million rows, when we increase that volume we run of of memory. We need to take a look at what the issue might be at root cause.
Yes, Hive has the EXPLAIN operator: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
Can you provide more information on your environment? How many nodes do you have? What is the server configuration (CPU, memory)? Is Hive on Tez enabled?