About sergio.pena

ychen · ‎07-20-2015

It should be fixed in CDH5.4.5

sergio.pena · ‎05-11-2015

COLUMNS_OLD is a deprecated table where columns used to be stored. Hive might have some information there for some reason. You can use both COLUMNS_OLD or COLUMNS_V2 when searching for your column.

sergio.pena · ‎05-07-2015

I recommend the 2nd option where you have 3 columns only: (PK, DATE, MEASURE). You cannot update records on Hive, so having the 365 columns will leave 364 columns unused, and this causes extra storage on your files (like separators chars, schema information, etc). Also, for read performance, 3 columns is still better than 365. Hive reads the full record every time you do a query, it then selects the columns you want, and applies the filter from the WHERE statement. This select/filter will happen with 3 or 365 columns, so 3 will be faster. Also, you're queries would be shorter, as you only need to filter the query by date (instead of looking for columns that have measure data). And, if you use columnar storage files (like Parquet), this filter may be faster.

Online	Offline
Last Visited	‎01-13-2016 05:08 PM

Member Since	‎01-27-2015 02:58 PM
Last Visited	‎01-13-2016 05:08 PM
Posts	16
Kudos received	5

Cloudera Community

Re: Accessing Hive Metadata

Re: Column Based Table Vs Row Based Table

Re: Can NOT select from external tables on S3 afte...

Re: Can NOT select from external tables on S3 afte...

Re: Can NOT select from external tables on S3 afte...

Re: Accessing Hive Metadata

Re: Column Based Table Vs Row Based Table