Created 10-16-2015 02:49 AM
Can you please provide some recommended way to move data from Phoenix based on certain query to Hive/orc without dumping into text Format
Created 10-16-2015 03:06 AM
I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.
see a simple example below:
PHOENIX: CREATE TABLE TEST( ID VARCHAR not null, TEXT VARCHAR CONSTRAINT PK PRIMARY KEY (ID) ) ; HIVE: CREATE EXTERNAL TABLE TEST(ID int, TEXT string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT") TBLPROPERTIES ("hbase.table.name" = "TEST");
A more complex example, with composite key and numeric data types:
PHOENIX: CREATE TABLE TEST2( ID1 VARCHAR NOT NULL, ID2 VARCHAR NOT NULL, NUM DOUBLE, TEXT VARCHAR CONSTRAINT PK PRIMARY KEY (ID1, ID2) ); HIVE: CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '\u0000' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT", "hbase.table.default.storage.type" = "binary") TBLPROPERTIES ("hbase.table.name" = "TEST2");
Created 10-16-2015 03:02 AM
Phoenix creates tables in HBASE so basically, you are asking to integrate HBASE and Hive. Check this blog. Hope this helps.
Created 10-16-2015 03:06 AM
I cant say it's the best way, but I would try to create a hive table with Hbase storage handler and then do a create table as select or insert/select.
see a simple example below:
PHOENIX: CREATE TABLE TEST( ID VARCHAR not null, TEXT VARCHAR CONSTRAINT PK PRIMARY KEY (ID) ) ; HIVE: CREATE EXTERNAL TABLE TEST(ID int, TEXT string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:TEXT") TBLPROPERTIES ("hbase.table.name" = "TEST");
A more complex example, with composite key and numeric data types:
PHOENIX: CREATE TABLE TEST2( ID1 VARCHAR NOT NULL, ID2 VARCHAR NOT NULL, NUM DOUBLE, TEXT VARCHAR CONSTRAINT PK PRIMARY KEY (ID1, ID2) ); HIVE: CREATE EXTERNAL TABLE TEST2(KEY STRUCT<ID1:STRING, ID2:STRING>, NUM DOUBLE, TEXT STRING) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '\u0000' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:NUM,0:TEXT", "hbase.table.default.storage.type" = "binary") TBLPROPERTIES ("hbase.table.name" = "TEST2");
Created 10-16-2015 06:21 PM
nasghar@hortonworks.com when you say "based on certain query" what does that mean? Are you thinking of this -> ingest data in Phoenix/hbase, then ETL it out to ORC, for running deep analytics?
Created 12-14-2015 04:50 PM
@nasghar:Why you want to move data from phoenix to Hive/Orc? If your intention is to run hive queries on HBase/Phoenix table then you can easily create Hive External tables on top of your existing HBase/Phoenix table. Or you intentionally wants to duplicate the data in Hive internal table? This way you will create 2 set of data which you have to maintain.(One in Hive and other in HBase)