- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do I import data from csv file into Hbase?
- Labels:
-
Apache HBase
Created ‎12-16-2015 04:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎12-16-2015 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.
1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.
2. You can use Phoenix for the same if using Phoenix with HBase.
https://phoenix.apache.org/bulk_dataload.html
3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.
Created ‎12-16-2015 04:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Aidan Condron,
One option worth considering is Apache Phoenix (https://phoenix.apache.org/). Phoenix using relational constructs to make working with data in HBase simpler. With HDP we have a simple example of loading CSV data into HBase and querying using Pheonix. Check it our here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP...
Created ‎12-16-2015 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.
1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.
2. You can use Phoenix for the same if using Phoenix with HBase.
https://phoenix.apache.org/bulk_dataload.html
3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.
Created ‎12-17-2015 10:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI @Aidan Condron, If you're not bulk loading, you can upload to HBase through Hive. Head to Hive through Ambari. You can upload your .csv files to HDFS, I use the tmp folder. Then use the following in Hive,
create table MyTable (col_value STRING); LOAD DATA INPATH '/tmp/MyData.csv' OVERWRITE INTO TABLE MyTable; CREATE TABLE MyHiveTable (FirstName STRING, LastName STRING); insert overwrite table MyHiveTable SELECT regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) FirstName, regexp_extract(col_value, '^(?:([^,]*)\,?){2}', 1) LastName from MyTable; CREATE TABLE MyHBaseTable(firstname STRING, lastname STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, f:c1') TBLPROPERTIES ('hbase.table.name' = 'MyNamesTable'); FROM MyHiveTable INSERT INTO TABLE MyHBaseTable Select MyHiveTable.*;
It's not a fast method, but the Regex and intermediary stages are useful if you need to additional control over your data before it goes into HBase
