Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best/Optimized Way to move data from Phoenix to Hive/Orc Table

avatar
Cloudera Employee

Can you please provide some recommended way to move data from Phoenix based on certain query to Hive/orc without dumping into text Format

1 ACCEPTED SOLUTION

avatar

I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.

see a simple example below:

PHOENIX:
CREATE TABLE TEST(
  ID VARCHAR not null,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID)
) ;

HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");

A more complex example, with composite key and numeric data types:

PHOENIX:
CREATE TABLE TEST2(
  ID1 VARCHAR NOT NULL,
  ID2 VARCHAR NOT NULL,
  NUM DOUBLE,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);

HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) 
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");

View solution in original post

4 REPLIES 4

avatar
Master Mentor

Phoenix creates tables in HBASE so basically, you are asking to integrate HBASE and Hive. Check this blog. Hope this helps.

avatar

I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.

see a simple example below:

PHOENIX:
CREATE TABLE TEST(
  ID VARCHAR not null,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID)
) ;

HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");

A more complex example, with composite key and numeric data types:

PHOENIX:
CREATE TABLE TEST2(
  ID1 VARCHAR NOT NULL,
  ID2 VARCHAR NOT NULL,
  NUM DOUBLE,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);

HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) 
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");

avatar
Explorer

nasghar@hortonworks.com when you say "based on certain query" what does that mean? Are you thinking of this -> ingest data in Phoenix/hbase, then ETL it out to ORC, for running deep analytics?

avatar
Expert Contributor

@nasghar:Why you want to move data from phoenix to Hive/Orc? If your intention is to run hive queries on HBase/Phoenix table then you can easily create Hive External tables on top of your existing HBase/Phoenix table. Or you intentionally wants to duplicate the data in Hive internal table? This way you will create 2 set of data which you have to maintain.(One in Hive and other in HBase)