Support Questions

Find answers, ask questions, and share your expertise

How do I import data from csv file into Hbase?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar

@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.

1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.

2. You can use Phoenix for the same if using Phoenix with HBase.

https://phoenix.apache.org/bulk_dataload.html

3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...

View solution in original post

3 REPLIES 3

avatar

Hi @Aidan Condron,

One option worth considering is Apache Phoenix (https://phoenix.apache.org/). Phoenix using relational constructs to make working with data in HBase simpler. With HDP we have a simple example of loading CSV data into HBase and querying using Pheonix. Check it our here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP...

avatar

@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.

1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.

2. You can use Phoenix for the same if using Phoenix with HBase.

https://phoenix.apache.org/bulk_dataload.html

3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...

avatar
Explorer

HI @Aidan Condron, If you're not bulk loading, you can upload to HBase through Hive. Head to Hive through Ambari. You can upload your .csv files to HDFS, I use the tmp folder. Then use the following in Hive,

create table MyTable (col_value STRING);

LOAD DATA INPATH '/tmp/MyData.csv' OVERWRITE INTO TABLE MyTable;

CREATE TABLE MyHiveTable (FirstName STRING, LastName STRING);

insert overwrite table MyHiveTable
SELECT
regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) FirstName,  
regexp_extract(col_value, '^(?:([^,]*)\,?){2}', 1) LastName
from MyTable;

CREATE TABLE MyHBaseTable(firstname STRING, lastname STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, f:c1')
TBLPROPERTIES ('hbase.table.name' = 'MyNamesTable');

FROM MyHiveTable INSERT INTO TABLE MyHBaseTable
Select MyHiveTable.*;

It's not a fast method, but the Regex and intermediary stages are useful if you need to additional control over your data before it goes into HBase