Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Best/Optimized Way to move data from Phoenix to Hive/Orc Table

avatar
Cloudera Employee

Can you please provide some recommended way to move data from Phoenix based on certain query to Hive/orc without dumping into text Format

1 ACCEPTED SOLUTION

avatar

I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.

see a simple example below:

PHOENIX:
CREATE TABLE TEST(
  ID VARCHAR not null,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID)
) ;

HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");

A more complex example, with composite key and numeric data types:

PHOENIX:
CREATE TABLE TEST2(
  ID1 VARCHAR NOT NULL,
  ID2 VARCHAR NOT NULL,
  NUM DOUBLE,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);

HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) 
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");

View solution in original post

4 REPLIES 4

avatar
Master Mentor

Phoenix creates tables in HBASE so basically, you are asking to integrate HBASE and Hive. Check this blog. Hope this helps.

avatar

I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.

see a simple example below:

PHOENIX:
CREATE TABLE TEST(
  ID VARCHAR not null,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID)
) ;

HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");

A more complex example, with composite key and numeric data types:

PHOENIX:
CREATE TABLE TEST2(
  ID1 VARCHAR NOT NULL,
  ID2 VARCHAR NOT NULL,
  NUM DOUBLE,
  TEXT VARCHAR
  CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);

HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) 
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");

avatar
New Member

nasghar@hortonworks.com when you say "based on certain query" what does that mean? Are you thinking of this -> ingest data in Phoenix/hbase, then ETL it out to ORC, for running deep analytics?

avatar
Expert Contributor

@nasghar:Why you want to move data from phoenix to Hive/Orc? If your intention is to run hive queries on HBase/Phoenix table then you can easily create Hive External tables on top of your existing HBase/Phoenix table. Or you intentionally wants to duplicate the data in Hive internal table? This way you will create 2 set of data which you have to maintain.(One in Hive and other in HBase)