Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

importing csv file in hbase

avatar
Super Collaborator

I found this article from Hortonworks about importing tsv files , how can I apply the same for csv files?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_importing_data_into_hbase_guide/content/c...

13 REPLIES 13

avatar
Super Guru

Note how there is a tab ("\t") character provided in step #5. Change this to a comma (",") character and you can read CSV files. You provide the column delimiter to match the data you want to ingest.

avatar
Super Collaborator

I tried the following but I am getting errors :

the fields in the file below represent the following data 
id  lowtemp hightemp vibration lowpressure highpressure
[hbase@hadoop1 ~]$ more a.csv
5842,50,30,4,240,340
5843,52,32,5,250,360
5844,56,31,2,248,333
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ more load_csv.ddl
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'temp:lowtemp,temp:hightemp,vibration,pressure:lowpressure,pressure:highpress
ure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ hcat -f  load_csv.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed column family, 
column qualifier specification.)


avatar
Rising Star

You did not specify column family for vibration. You also omitted the key field.

Try something like this:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');

avatar
Super Collaborator

I want to make columns as "TEMP" "VIBRATION" "PRESSSURE" , The TEMP and PRESSURE have two columns "high" and "low" , what will be the command syntax then ?

avatar
Super Collaborator

also if I try your command I get this error :

[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))

avatar
Super Collaborator

but I am stuck even before this step. .. please see the error below

avatar
Super Collaborator

ok I created the table in hive n hbase as follows ,

how do I run those "A = " and "STORE A INTO" commands. . as hive ? hbase? pig?

create table test2(id int, lowT string, highT string, vib int, lowP string,highP string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,temperature:lowT,temperature:highT,vibration:vib, pressure:lowP, pressure:highP') 
TBLPROPERTIES('hbase.table.name'='test1') ;

avatar
Rising Star

You can specify the delimiter in the LOAD script; Let's assume, that in your source file, the delimiter is semicolon (';') instead of TAB. Then you will want to run this LOAD script:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');


A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage(',') AS (id:chararray, lowtemp:chararray,
hightemp:chararray, vibration:chararray, lowpressure:chararray, highpressure:chararray);
-- DUMP A;
STORE A INTO 'load_csv' USING org.apache.hive.hcatalog.pig.HCatStorer();

More specifically, in the delimiter is the argument for the PigStorage

avatar
Super Collaborator

if I try the example directly given by Hortonworks I get the error

[hbase@hadoop1 ~]$ more a.ddl
CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'd:c1,d:c2' )
TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table'
);


[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)