Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

importing csv file in hbase

avatar
Master Collaborator

I found this article from Hortonworks about importing tsv files , how can I apply the same for csv files?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_importing_data_into_hbase_guide/content/c...

13 REPLIES 13

avatar

Note how there is a tab ("\t") character provided in step #5. Change this to a comma (",") character and you can read CSV files. You provide the column delimiter to match the data you want to ingest.

avatar
Master Collaborator

I tried the following but I am getting errors :

the fields in the file below represent the following data 
id  lowtemp hightemp vibration lowpressure highpressure
[hbase@hadoop1 ~]$ more a.csv
5842,50,30,4,240,340
5843,52,32,5,250,360
5844,56,31,2,248,333
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ more load_csv.ddl
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'temp:lowtemp,temp:hightemp,vibration,pressure:lowpressure,pressure:highpress
ure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ hcat -f  load_csv.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed column family, 
column qualifier specification.)


avatar
Contributor

You did not specify column family for vibration. You also omitted the key field.

Try something like this:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');

avatar
Master Collaborator

I want to make columns as "TEMP" "VIBRATION" "PRESSSURE" , The TEMP and PRESSURE have two columns "high" and "low" , what will be the command syntax then ?

avatar
Master Collaborator

also if I try your command I get this error :

[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))

avatar
Master Collaborator

but I am stuck even before this step. .. please see the error below

avatar
Master Collaborator

ok I created the table in hive n hbase as follows ,

how do I run those "A = " and "STORE A INTO" commands. . as hive ? hbase? pig?

create table test2(id int, lowT string, highT string, vib int, lowP string,highP string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,temperature:lowT,temperature:highT,vibration:vib, pressure:lowP, pressure:highP') 
TBLPROPERTIES('hbase.table.name'='test1') ;

avatar
Contributor

You can specify the delimiter in the LOAD script; Let's assume, that in your source file, the delimiter is semicolon (';') instead of TAB. Then you will want to run this LOAD script:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');


A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage(',') AS (id:chararray, lowtemp:chararray,
hightemp:chararray, vibration:chararray, lowpressure:chararray, highpressure:chararray);
-- DUMP A;
STORE A INTO 'load_csv' USING org.apache.hive.hcatalog.pig.HCatStorer();

More specifically, in the delimiter is the argument for the PigStorage

avatar
Master Collaborator

if I try the example directly given by Hortonworks I get the error

[hbase@hadoop1 ~]$ more a.ddl
CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'd:c1,d:c2' )
TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table'
);


[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)

avatar

Please include the full exception. I would guess that your classpath is wrong, causing this to not find your HBase instance.

avatar
Master Collaborator

hi pbarna can you try your create table command and see if it works for you? its not working for me.

thanks

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))
[hbase@hadoop1 ~]$

avatar

Swap ":id" with ":key" in the hbase.columns.mapping. Just a simple typo. See https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration for documentation on configuring the HBaseStorageHandler.

avatar
Master Collaborator

I solved this problem using the following method , but I do want to know why would one want to use the SERDE method and not this one?

[hbase@hadoop1 ~]$ more a.csv
5842,50,30,4,240,340
5843,52,32,5,250,360
5844,56,31,2,248,333
[hbase@hadoop1 ~]$

create table test3(Id int, lowT string, highT string,vib int, lowP string,highP string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
load data inpath '/user/hbase/a.csv' OVERWRITE INTO TABLE test3;
Loading data to table default.test3

Table default.test3 stats: [numFiles=1, numRows=0, totalSize=63, rawDataSize=0]
OK
Time taken: 0.668 seconds

Labels