Member since
02-01-2018
37
Posts
2
Kudos Received
0
Solutions
03-08-2018
11:34 AM
@Binu Mathew Hey, In my case lot of mappers are launched when I run a select query on ORC file. Also, are there some particular settings of hive to be turned on so that read operations in ORC use ppd. I have tried a lot but almost all my queries read the same as size(of my ORC table), which means reader is reading the whole ORC file. I run Hive 0.13.
... View more
03-05-2018
06:09 AM
Will usage of DISTRIBUTE BY or SORT BY be helpful?
... View more
03-05-2018
04:28 AM
My Hive table is in ORC format and queries in it run fastest when columns in where clause are sorted. But in my case there are not currently. What is the syntax to sort a column just before query?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
03-02-2018
06:39 PM
@owen My number of mapper and reducers are almost down to half in ORC for a query. bytes read from HDFS is also reduced significantly. But still the time taken by ORC query is almost same as sequence file query.
... View more
03-02-2018
05:38 PM
1 Kudo
Yup @Michael Young. Another way I found was through hadoop fs -text <file-location> On the top of results, INFO compress.CodecPool: Got brand-new decompressor [.snappy] is written which I think is a confirmation that snappy compression is applied.
... View more
02-27-2018
06:45 PM
I am using Hive 0.13 I didn't try turning on vectorization yet. It was sum of entire column (of a partition in my table).
... View more
02-27-2018
07:57 AM
1 Kudo
So I compressed my table in hive using snappy compression and it did get compress. The size was reduced. But when i run hadoop fs -lsr /hive/user.db/table_name, I see no file extensions with .snappy. I want to know if they really were snappy compressed or not?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
02-26-2018
06:24 PM
Link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC Above link has a part named serialization. Can somebody tell what serialization is and for what it is used for?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
02-25-2018
07:36 PM
Are the changes in schema like adding, deleting, renaming, modifying data type in columns permitted without breaking anything in ORC files in Hive 0.13.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
02-23-2018
07:52 AM
Hey, I am pretty much confused which storage format is suited for which type of data. You said "Parquet is well suited for data warehouse kind of solutions where aggregations are required on certain column over a huge set of data.", But I think its true for ORC too. And As @owen said, ORC contains indexes at 3 levels (2 levels in parquet), shouldn't ORC be faster than Parquet for aggregations.
... View more