Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala-over-hbase table has columns in alphabetical order instead of as specified

SOLVED Go to solution
Highlighted

Impala-over-hbase table has columns in alphabetical order instead of as specified

Contributor

When I create an impala/hive table over an hbase table, the columns in Impala appear in alphabetical order instead of as defined.  Not a blocker, but really annoying and might become an issue down the road. 

 

Anyone know what could be happening? We're on CDH 5.13, thanks.

 

 

Defined in hbase with:

 

create 'irdw_sandbox:date_dim', {NAME => 'mcf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'true', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX_TREE', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}

Defined in beehive with:

 

CREATE EXTERNAL TABLE irdw_sandbox.hbase_date_dim (
key STRING,
id INT,
sqldate TIMESTAMP,
year INT,
quarter_of_year INT,
month_of_year INT,
week_of_year INT,
day_of_year INT,
day_name STRING,
day_of_week INT,
day_of_month INT,
day_type STRING,
month_name STRING,
week_of_year_mtos INT
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" =
":key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos"
)
TBLPROPERTIES("hbase.table.name" = "irdw_sandbox:date_dim")

 

and a show create table in hive looks fine:

 

CREATE EXTERNAL TABLE `hbase_date_dim`(
  `key` string COMMENT 'from deserializer', 
  `id` int COMMENT 'from deserializer', 
  `sqldate` timestamp COMMENT 'from deserializer', 
  `year` int COMMENT 'from deserializer', 
  `quarter_of_year` int COMMENT 'from deserializer', 
  `month_of_year` int COMMENT 'from deserializer', 
  `week_of_year` int COMMENT 'from deserializer', 
  `day_of_year` int COMMENT 'from deserializer', 
  `day_name` string COMMENT 'from deserializer', 
  `day_of_week` int COMMENT 'from deserializer', 
  `day_of_month` int COMMENT 'from deserializer', 
  `day_type` string COMMENT 'from deserializer', 
  `month_name` string COMMENT 'from deserializer', 
  `week_of_year_mtos` int COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
  'hbase.columns.mapping'=':key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos', 
  'serialization.format'='1')
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'hbase.table.name'='irdw_sandbox:date_dim', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='0', 

but a show create table in impala (after invalidate metadata to recognize the new table) doesn't:

 

CREATE EXTERNAL TABLE irdw_sandbox.hbase_date_dim (
   key STRING,   
   day_name STRING,   
   day_of_month INT,   
   day_of_week INT,   
   day_of_year INT,   
   day_type STRING,   
   id INT,   
   month_name STRING,   
   month_of_year INT,   
   quarter_of_year INT,   
   sqldate TIMESTAMP,   
   week_of_year INT,   
   week_of_year_mtos INT,   
   year INT 
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos', 'serialization.format'='1') 
TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'hbase.table.name'='irdw_sandbox:date_dim', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'storage_handler'='org.apache.hadoop.hive.hbase.HBaseStorageHandler', 'totalSize'='0')

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Impala-over-hbase table has columns in alphabetical order instead of as specified

Master Collaborator

I think this is an area where the mismatch between HBase's data model and the traditional relational model cause some weirdness - from what I understand, HBase doesn't have a idea of an arbitrary order of columns in a table - I think Impala is not preserving the creation order when reloading the table.

 

There are some known issues in this area, maybe we could do better: https://issues.apache.org/jira/browse/IMPALA-886

2 REPLIES 2

Re: Impala-over-hbase table has columns in alphabetical order instead of as specified

Master Collaborator

I think this is an area where the mismatch between HBase's data model and the traditional relational model cause some weirdness - from what I understand, HBase doesn't have a idea of an arbitrary order of columns in a table - I think Impala is not preserving the creation order when reloading the table.

 

There are some known issues in this area, maybe we could do better: https://issues.apache.org/jira/browse/IMPALA-886

Re: Impala-over-hbase table has columns in alphabetical order instead of as specified

Contributor
Thanks very much Tim for looking up the JIRA. Yikes, been open since 2014. As John pointed out there, column order info must be in the metastore since hive show create table displays fine, so seems like this should be a simple change to how impala reads that info. Upvoted the JIRA.