Support Questions

Find answers, ask questions, and share your expertise

Is CombineHiveInputFormat deprecated by OrcInputFormat?

avatar
Master Mentor

I'm trying to run a TABLESAMLE query with PERCENTAGE and I'm getting

Error: Error while compiling statement: FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '2' (state=42000,code=40000)

String inputFormat = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEINPUTFORMAT);
-      if (!inputFormat.equals(
-        CombineHiveInputFormat.class.getName())) {
-        throw new SemanticException(generateErrorMessage((ASTNode) tabref.getChild(1),
-            "Percentage sampling is not supported in " + inputFormat));
-      }

the above is from test code I found referencing the error and it's from at most Hive 0.12. So I guess my real question is, is TABLESAMPLE with PERCENTAGE still supported and if yes, can it be used with ORC?

1 ACCEPTED SOLUTION

avatar
Master Mentor

I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312

View solution in original post

8 REPLIES 8

avatar
Master Guru

Very weird, someone from the dev team might have a better idea but ORCFileInputFormat actually is implementing CombineHiveInputFormat. CombineHiveInputFormat is the layer between much of Hive and the most commonly used underlying Inputformats. So ORCInputFormat should be a CombineHiveInputFormat and this should not be happening.

Update: I think TABLESAMPLE ( PERCENTAGE ) is broken in general I just tried it with two tables one text format one ORC and neither works. Both with the Error message you got. Now its weird because a CombineHiveInputFormat is also a HiveInputFormat ( which he says I have ).

public class OrcInputFormat  implements .... CombineHiveInputFormat

avatar
Master Mentor

Can you try to reproduce? It is an CSV dataset loaded into an ORC using CTAS. Try to run TABLESAMPLE with percentage.

avatar
Master Guru

@Artem Ervits

I tried to reproduce it. I used a sample_07 database and CTASed it as TEXT and as ORC table. Get the same error message for both.

It sound weird but my guess would be that this syntax has not worked for a long time. He checks if the classname equals to CombinHiveInputFormat and since All classes extend CombineFileInputFormat I am not sure how that could be true anymore.

My guess would be that in the good old times CombineFileInputFormat was the actual class being used and now the classes just extend it so the check doesn't work anymore. But just a guess.

avatar
Master Mentor

@Benjamin Leonhardi it doesn't matter whether table is text or ORC, percentage for tablesample is not working. @gopal is this a bug?

hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_orc TABLESAMPLE(20 percent);
FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_text TABLESAMPLE(20 percent);
FAILED: SemanticException 1:68 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_raw TABLESAMPLE(20 percent);
FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'

avatar
Master Guru

@Artem Ervits @gopal as said from looking in the code I am pretty sure it is. They check for the hive input format class but sometimes they refactored it to become an interface so the check doesn't work anymore.

avatar
Master Mentor

I confirmed it and opened jira, see below

avatar
Master Guru

Ah cool didn't see that!

avatar
Master Mentor

I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312