Created 12-16-2015 04:02 PM
Created 12-16-2015 04:10 PM
@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.
1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.
2. You can use Phoenix for the same if using Phoenix with HBase.
https://phoenix.apache.org/bulk_dataload.html
3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.
Created 12-16-2015 04:09 PM
Hi @Aidan Condron,
One option worth considering is Apache Phoenix (https://phoenix.apache.org/). Phoenix using relational constructs to make working with data in HBase simpler. With HDP we have a simple example of loading CSV data into HBase and querying using Pheonix. Check it our here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP...
Created 12-16-2015 04:10 PM
@Aidan Condron You can do it in multiple ways as following. It depends on your requirement.
1. If your data is already in TSV or CSV format, skip this step and use the included ImportTsv utility and bulkload. See http://hbase.apache.org/book.html#arch.bulk.load for details.
2. You can use Phoenix for the same if using Phoenix with HBase.
https://phoenix.apache.org/bulk_dataload.html
3. Other option would be to use HiveHBase Storage Handler to do the same. Refer below for the same.
Created 12-17-2015 10:17 AM
HI @Aidan Condron, If you're not bulk loading, you can upload to HBase through Hive. Head to Hive through Ambari. You can upload your .csv files to HDFS, I use the tmp folder. Then use the following in Hive,
create table MyTable (col_value STRING); LOAD DATA INPATH '/tmp/MyData.csv' OVERWRITE INTO TABLE MyTable; CREATE TABLE MyHiveTable (FirstName STRING, LastName STRING); insert overwrite table MyHiveTable SELECT regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) FirstName, regexp_extract(col_value, '^(?:([^,]*)\,?){2}', 1) LastName from MyTable; CREATE TABLE MyHBaseTable(firstname STRING, lastname STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, f:c1') TBLPROPERTIES ('hbase.table.name' = 'MyNamesTable'); FROM MyHiveTable INSERT INTO TABLE MyHBaseTable Select MyHiveTable.*;
It's not a fast method, but the Regex and intermediary stages are useful if you need to additional control over your data before it goes into HBase