Created 10-16-2015 02:49 AM
Can you please provide some recommended way to move data from Phoenix based on certain query to Hive/orc without dumping into text Format
Created 10-16-2015 03:06 AM
I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.
see a simple example below:
PHOENIX:
CREATE TABLE TEST(
ID VARCHAR not null,
TEXT VARCHAR
CONSTRAINT PK PRIMARY KEY (ID)
) ;
HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");
A more complex example, with composite key and numeric data types:
PHOENIX:
CREATE TABLE TEST2(
ID1 VARCHAR NOT NULL,
ID2 VARCHAR NOT NULL,
NUM DOUBLE,
TEXT VARCHAR
CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);
HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");
Created 10-16-2015 03:02 AM
Phoenix creates tables in HBASE so basically, you are asking to integrate HBASE and Hive. Check this blog. Hope this helps.
Created 10-16-2015 03:06 AM
I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.
see a simple example below:
PHOENIX:
CREATE TABLE TEST(
ID VARCHAR not null,
TEXT VARCHAR
CONSTRAINT PK PRIMARY KEY (ID)
) ;
HIVE:
CREATE EXTERNAL TABLE TEST(ID int, TEXT string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT")
TBLPROPERTIES ("hbase.table.name" = "TEST");
A more complex example, with composite key and numeric data types:
PHOENIX:
CREATE TABLE TEST2(
ID1 VARCHAR NOT NULL,
ID2 VARCHAR NOT NULL,
NUM DOUBLE,
TEXT VARCHAR
CONSTRAINT PK PRIMARY KEY (ID1, ID2)
);
HIVE:
CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '\u0000'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT",
"hbase.table.default.storage.type" = "binary")
TBLPROPERTIES ("hbase.table.name" = "TEST2");
Created 10-16-2015 06:21 PM
nasghar@hortonworks.com when you say "based on certain query" what does that mean? Are you thinking of this -> ingest data in Phoenix/hbase, then ETL it out to ORC, for running deep analytics?
Created 12-14-2015 04:50 PM
@nasghar:Why you want to move data from phoenix to Hive/Orc? If your intention is to run hive queries on HBase/Phoenix table then you can easily create Hive External tables on top of your existing HBase/Phoenix table. Or you intentionally wants to duplicate the data in Hive internal table? This way you will create 2 set of data which you have to maintain.(One in Hive and other in HBase)