Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

importing csv file in hbase

Master Collaborator

I found this article from Hortonworks about importing tsv files , how can I apply the same for csv files?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_importing_data_into_hbase_guide/content/c...

13 REPLIES 13

Note how there is a tab ("\t") character provided in step #5. Change this to a comma (",") character and you can read CSV files. You provide the column delimiter to match the data you want to ingest.

Master Collaborator

I tried the following but I am getting errors :

the fields in the file below represent the following data 
id  lowtemp hightemp vibration lowpressure highpressure
[hbase@hadoop1 ~]$ more a.csv
5842,50,30,4,240,340
5843,52,32,5,250,360
5844,56,31,2,248,333
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ more load_csv.ddl
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'temp:lowtemp,temp:hightemp,vibration,pressure:lowpressure,pressure:highpress
ure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');
[hbase@hadoop1 ~]$
[hbase@hadoop1 ~]$ hcat -f  load_csv.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed column family, 
column qualifier specification.)


Contributor

You did not specify column family for vibration. You also omitted the key field.

Try something like this:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');

Master Collaborator

I want to make columns as "TEMP" "VIBRATION" "PRESSSURE" , The TEMP and PRESSURE have two columns "high" and "low" , what will be the command syntax then ?

Master Collaborator

also if I try your command I get this error :

[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))

Master Collaborator

but I am stuck even before this step. .. please see the error below

Master Collaborator

ok I created the table in hive n hbase as follows ,

how do I run those "A = " and "STORE A INTO" commands. . as hive ? hbase? pig?

create table test2(id int, lowT string, highT string, vib int, lowP string,highP string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,temperature:lowT,temperature:highT,vibration:vib, pressure:lowP, pressure:highP') 
TBLPROPERTIES('hbase.table.name'='test1') ;

Contributor

You can specify the delimiter in the LOAD script; Let's assume, that in your source file, the delimiter is semicolon (';') instead of TAB. Then you will want to run this LOAD script:

CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' )
TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');


A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage(',') AS (id:chararray, lowtemp:chararray,
hightemp:chararray, vibration:chararray, lowpressure:chararray, highpressure:chararray);
-- DUMP A;
STORE A INTO 'load_csv' USING org.apache.hive.hcatalog.pig.HCatStorer();

More specifically, in the delimiter is the argument for the PigStorage

Master Collaborator

if I try the example directly given by Hortonworks I get the error

[hbase@hadoop1 ~]$ more a.ddl
CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'd:c1,d:c2' )
TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table'
);


[hbase@hadoop1 ~]$ hcat -f  a.ddl
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)

Please include the full exception. I would guess that your classpath is wrong, causing this to not find your HBase instance.

Master Collaborator

hi pbarna can you try your create table command and see if it works for you? its not working for me.

thanks

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))
[hbase@hadoop1 ~]$

Swap ":id" with ":key" in the hbase.columns.mapping. Just a simple typo. See https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration for documentation on configuring the HBaseStorageHandler.

Master Collaborator

I solved this problem using the following method , but I do want to know why would one want to use the SERDE method and not this one?

[hbase@hadoop1 ~]$ more a.csv
5842,50,30,4,240,340
5843,52,32,5,250,360
5844,56,31,2,248,333
[hbase@hadoop1 ~]$

create table test3(Id int, lowT string, highT string,vib int, lowP string,highP string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
load data inpath '/user/hbase/a.csv' OVERWRITE INTO TABLE test3;
Loading data to table default.test3

Table default.test3 stats: [numFiles=1, numRows=0, totalSize=63, rawDataSize=0]
OK
Time taken: 0.668 seconds

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.