ISSUE : Simple select query against an ORC table without limit clause is failing with the below exception.
2017-06-26 16:00:35,228 ERROR [main]: CliDriver (SessionState.java:printError(993)) - Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem java.io.IOException: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1765) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:237) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:371) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:303) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:458) ... 15 more Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1253) ... 19 more Caused by: java.lang.OutOfMemoryError: Java heap space
We can see the query is failing when it is trying to generate ORC splits.
WORKAROUND:
hive.exec.orc.split.strategy=BI
What strategy ORC should use to create splits for execution. The available options are "BI", "ETL" and "HYBRID".
Default setting is HYBRID
The HYBRID mode reads the footers for all files if there are fewer files than expected mapper count, switching over to
generating 1 split per file if the average file sizes are smaller than the default HDFS blocksize. ETL strategy always reads the
ORC footers before generating splits, while the BI strategy generates per-file splits fast without reading any data from HDFS.