- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is CombineHiveInputFormat deprecated by OrcInputFormat?
- Labels:
-
Apache Hive
Created ‎02-26-2016 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to run a TABLESAMLE query with PERCENTAGE and I'm getting
Error: Error while compiling statement: FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '2' (state=42000,code=40000)
String inputFormat = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEINPUTFORMAT); - if (!inputFormat.equals( - CombineHiveInputFormat.class.getName())) { - throw new SemanticException(generateErrorMessage((ASTNode) tabref.getChild(1), - "Percentage sampling is not supported in " + inputFormat)); - }
the above is from test code I found referencing the error and it's from at most Hive 0.12. So I guess my real question is, is TABLESAMPLE with PERCENTAGE still supported and if yes, can it be used with ORC?
Created ‎03-18-2016 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312
Created ‎02-26-2016 09:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very weird, someone from the dev team might have a better idea but ORCFileInputFormat actually is implementing CombineHiveInputFormat. CombineHiveInputFormat is the layer between much of Hive and the most commonly used underlying Inputformats. So ORCInputFormat should be a CombineHiveInputFormat and this should not be happening.
Update: I think TABLESAMPLE ( PERCENTAGE ) is broken in general I just tried it with two tables one text format one ORC and neither works. Both with the Error message you got. Now its weird because a CombineHiveInputFormat is also a HiveInputFormat ( which he says I have ).
public class OrcInputFormat implements .... CombineHiveInputFormat
Created ‎02-26-2016 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you try to reproduce? It is an CSV dataset loaded into an ORC using CTAS. Try to run TABLESAMPLE with percentage.
Created ‎02-26-2016 10:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to reproduce it. I used a sample_07 database and CTASed it as TEXT and as ORC table. Get the same error message for both.
It sound weird but my guess would be that this syntax has not worked for a long time. He checks if the classname equals to CombinHiveInputFormat and since All classes extend CombineFileInputFormat I am not sure how that could be true anymore.
My guess would be that in the good old times CombineFileInputFormat was the actual class being used and now the classes just extend it so the check doesn't work anymore. But just a guess.
Created ‎03-18-2016 06:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi it doesn't matter whether table is text or ORC, percentage for tablesample is not working. @gopal is this a bug?
hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_orc TABLESAMPLE(20 percent); FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20' hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_text TABLESAMPLE(20 percent); FAILED: SemanticException 1:68 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20' hive> SELECT * FROM medicare_part_b.medicare_part_b_2013_raw TABLESAMPLE(20 percent); FAILED: SemanticException 1:67 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20'
Created ‎03-19-2016 10:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artem Ervits @gopal as said from looking in the code I am pretty sure it is. They check for the hive input format class but sometimes they refactored it to become an interface so the check doesn't work anymore.
Created ‎03-19-2016 11:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I confirmed it and opened jira, see below
Created ‎03-20-2016 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah cool didn't see that!
Created ‎03-18-2016 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to close this as it's a confirmed bug in Hive 1.2.1. I opened a Jira https://issues.apache.org/jira/browse/HIVE-13312
