Created 02-26-2016 02:27 AM
I'm trying to run a TABLESAMLE query with PERCENTAGE and I'm getting
Error: Error while compiling statement: FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '2' (state=42000,code=40000)
String inputFormat = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEINPUTFORMAT); - if (!inputFormat.equals( - CombineHiveInputFormat.class.getName())) { - throw new SemanticException(generateErrorMessage((ASTNode) tabref.getChild(1), - "Percentage sampling is not supported in " + inputFormat)); - }
the above is from test code I found referencing the error and it's from at most Hive 0.12. So I guess my real question is, is TABLESAMPLE with PERCENTAGE still supported and if yes, can it be used with ORC?
Created 03-18-2016 08:06 PM
I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312
Created 02-26-2016 09:57 AM
Very weird, someone from the dev team might have a better idea but ORCFileInputFormat actually is implementing CombineHiveInputFormat. CombineHiveInputFormat is the layer between much of Hive and the most commonly used underlying Inputformats. So ORCInputFormat should be a CombineHiveInputFormat and this should not be happening.
Update: I think TABLESAMPLE ( PERCENTAGE ) is broken in general I just tried it with two tables one text format one ORC and neither works. Both with the Error message you got. Now its weird because a CombineHiveInputFormat is also a HiveInputFormat ( which he says I have ).
public class OrcInputFormat implements .... CombineHiveInputFormat
Created 02-26-2016 10:08 AM
Can you try to reproduce? It is an CSV dataset loaded into an ORC using CTAS. Try to run TABLESAMPLE with percentage.
Created 02-26-2016 10:55 AM
I tried to reproduce it. I used a sample_07 database and CTASed it as TEXT and as ORC table. Get the same error message for both.
It sound weird but my guess would be that this syntax has not worked for a long time. He checks if the classname equals to CombinHiveInputFormat and since All classes extend CombineFileInputFormat I am not sure how that could be true anymore.
My guess would be that in the good old times CombineFileInputFormat was the actual class being used and now the classes just extend it so the check doesn't work anymore. But just a guess.
Created 03-18-2016 06:13 PM
@Benjamin Leonhardi it doesn't matter whether table is text or ORC, percentage for tablesample is not working. @gopal is this a bug?
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_orc TABLESAMPLE(20 percent); FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20' hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_text TABLESAMPLE(20 percent); FAILED: SemanticException 1:68 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20' hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_raw TABLESAMPLE(20 percent); FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
Created 03-19-2016 10:32 PM
@Artem Ervits @gopal as said from looking in the code I am pretty sure it is. They check for the hive input format class but sometimes they refactored it to become an interface so the check doesn't work anymore.
Created 03-19-2016 11:16 PM
I confirmed it and opened jira, see below
Created 03-20-2016 02:25 PM
Ah cool didn't see that!
Created 03-18-2016 08:06 PM
I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312