Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CREATE EXTERNAL HIVE TABLE on existing HBASE Table

avatar

Hi,

I have below Oracle table -

CREATE TABLE ADDRESSES 
(
  HJMPTS NUMBER(20, 0) 
, CREATEDTS TIMESTAMP(3) 
, MODIFIEDTS TIMESTAMP(3) 
, TYPEPKSTRING NUMBER(20, 0) 
, OWNERPKSTRING NUMBER(20, 0) 
, PK NUMBER(20, 0) NOT NULL 
, P_ORIGINAL NUMBER(20, 0) 
, P_DUPLICATE NUMBER(1, 0) 
, P_APPARTMENT VARCHAR2(255 CHAR) 
, P_BUILDING VARCHAR2(255 CHAR) 
, P_CELLPHONE VARCHAR2(255 CHAR) 
, P_COMPANY VARCHAR2(255 CHAR) 
, P_COUNTRY NUMBER(20, 0) 
, P_DEPARTMENT VARCHAR2(255 CHAR) 
, P_DISTRICT VARCHAR2(255 CHAR) 
, P_EMAIL VARCHAR2(255 CHAR) 
, P_FAX VARCHAR2(255 CHAR) 
, P_FIRSTNAME VARCHAR2(255 CHAR) 
, P_LASTNAME VARCHAR2(255 CHAR) 
, P_MIDDLENAME VARCHAR2(255 CHAR) 
, P_MIDDLENAME2 VARCHAR2(255 CHAR) 
, P_PHONE1 VARCHAR2(255 CHAR) 
, P_PHONE2 VARCHAR2(255 CHAR) 
, P_POBOX VARCHAR2(255 CHAR) 
, P_POSTALCODE VARCHAR2(255 CHAR) 
, P_REGION NUMBER(20, 0) 
, P_STREETNAME VARCHAR2(255 CHAR) 
, P_STREETNUMBER VARCHAR2(255 CHAR) 
, P_TITLE NUMBER(20, 0) 
, P_TOWN VARCHAR2(255 CHAR) 
, P_GENDER NUMBER(20, 0) 
, P_DATEOFBIRTH TIMESTAMP(3) 
, P_REMARKS VARCHAR2(255 CHAR) 
, P_URL VARCHAR2(255 CHAR) 
, P_SHIPPINGADDRESS NUMBER(1, 0) 
, P_UNLOADINGADDRESS NUMBER(1, 0) 
, P_BILLINGADDRESS NUMBER(1, 0) 
, P_CONTACTADDRESS NUMBER(1, 0) 
, P_VISIBLEINADDRESSBOOK NUMBER(1, 0) 
, P_STATE VARCHAR2(255 CHAR) 
, P_LANDMARK VARCHAR2(255 CHAR) 
, P_CODELIGIBLE NUMBER(1, 0) 
, ACLTS NUMBER(20, 0) DEFAULT 0 
, PROPTS NUMBER(20, 0) DEFAULT 0 
, P_ISHOMEADDRESS NUMBER(1, 0) 
) 

With column PK as primary key. Same has been replicated to HBASE through Oracle Golden Gate.

Below is the description of same in hbase -

hbase(main):016:0> describe 'FNL:ADDRESSES'
Table FNL:ADDRESSES is ENABLED
FNL:ADDRESSES
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION
 => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

I am trying to create HIVE EXTERNAL TABLE on this table so that I can access the same through SQL. Below is the command I am using -

CREATE EXTERNAL TABLE ADDRESSES
(
 HJMPTS DECIMAL(20, 0) 
, CREATEDTS TIMESTAMP 
, MODIFIEDTS TIMESTAMP 
, TYPEPKSTRING DECIMAL(20, 0) 
, OWNERPKSTRING DECIMAL(20, 0) 
, PK DECIMAL(20, 0) 
, P_ORIGINAL DECIMAL(20, 0) 
, P_DUPLICATE DECIMAL(1, 0) 
, P_APPARTMENT VARCHAR(255) 
, P_BUILDING VARCHAR(255) 
, P_CELLPHONE VARCHAR(255) 
, P_COMPANY VARCHAR(255) 
, P_COUNTRY DECIMAL(20, 0) 
, P_DEPARTMENT VARCHAR(255) 
, P_DISTRICT VARCHAR(255) 
, P_EMAIL VARCHAR(255) 
, P_FAX VARCHAR(255) 
, P_FIRSTNAME VARCHAR(255) 
, P_LASTNAME VARCHAR(255) 
, P_MIDDLENAME VARCHAR(255) 
, P_MIDDLENAME2 VARCHAR(255) 
, P_PHONE1 VARCHAR(255) 
, P_PHONE2 VARCHAR(255) 
, P_POBOX VARCHAR(255) 
, P_POSTALCODE VARCHAR(255) 
, P_REGION DECIMAL(20, 0) 
, P_STREETNAME VARCHAR(255) 
, P_STREETNUMBER VARCHAR(255) 
, P_TITLE DECIMAL(20, 0) 
, P_TOWN VARCHAR(255) 
, P_GENDER DECIMAL(20, 0) 
, P_DATEOFBIRTH TIMESTAMP 
, P_REMARKS VARCHAR(255) 
, P_URL VARCHAR(255) 
, P_SHIPPINGADDRESS DECIMAL(1, 0) 
, P_UNLOADINGADDRESS DECIMAL(1, 0) 
, P_BILLINGADDRESS DECIMAL(1, 0) 
, P_CONTACTADDRESS DECIMAL(1, 0) 
, P_VISIBLEINADDRESSBOOK DECIMAL(1, 0) 
, P_STATE VARCHAR(255) 
, P_LANDMARK VARCHAR(255) 
, P_CODELIGIBLE DECIMAL(1, 0) 
, ACLTS DECIMAL(20, 0) 
, PROPTS DECIMAL(20, 0)
, P_ISHOMEADDRESS DECIMAL(1, 0) 
) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:val")
TBLPROPERTIES("hbase.table.name" = "FNL:ADDRESSES")

Is this the correct way to create external table? When I try to execute the same in HIVE, I am getting error -

java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 45 elements while hbase.columns.mapping has 2 elements (counting the key if implicit))
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Your column mapping is wrong, as stated in the error message. The list in the columns mapping must match your list of columns in the external table definition. You simply list all columns in the form "columnFamilyName:columnName". As you seem to have only one column family 'cf', and I assume the oracle columns have all been migrated into the same column name with the column family cf. Then you will need the mapping to be:

"hbase.columns.mapping"="cf:HJMPTS, cf:CREATEDTS, cf:MODIFIEDDTS, 
... ,
cf:PROPTS, cf:P_ISHOMEADDRESS"

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Your column mapping is wrong, as stated in the error message. The list in the columns mapping must match your list of columns in the external table definition. You simply list all columns in the form "columnFamilyName:columnName". As you seem to have only one column family 'cf', and I assume the oracle columns have all been migrated into the same column name with the column family cf. Then you will need the mapping to be:

"hbase.columns.mapping"="cf:HJMPTS, cf:CREATEDTS, cf:MODIFIEDDTS, 
... ,
cf:PROPTS, cf:P_ISHOMEADDRESS"

avatar

Thanks @Harald Berghoff...should key column 'PK' be part of column family? Also, do we have to mention all column names in column family even if hive external table and hbase table have same column names.

avatar
Super Collaborator

If PK should be part of the column family depends, in most cases if it is just a sequential number without additional info you will not need it, it will be used as the rowkey in Hbase. And yes, you will have to list all columns, actually they don't have the same name, i.e. in hive definition it is just 'HJMPTS' while in Hbase it is 'cf:HJMPTS'.

This is important, as you could now add a new column family, which could contain also a column HJMPTS. The column name without the columnfamily name isn't necessarily unique in Hbase. In your case it is as you have been migrating an Oracle table.

avatar

@Anirudh D, As I am new to HBASE, I just want to know why you are loading data into Hbase rather you can directly load data from Oracle to Hive using any ETL or other tools. Could you please explain so i can get better idea about Hbase purpose and usage.

avatar