Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Import / Load data from CSV to Hbase

avatar
Explorer

i am new in hadoop developer, now i try to research with Hbase table.

i want to try to load data from my CSV file. I have more than 10 million data from my CSV file. so i want to populate it to the Hbase table.

but i do not know how to do it. Anybody can help me ? 

what is step by step to populate hbase table from my CSV file ? 

thank you very much, i need somebody help ..

1 ACCEPTED SOLUTION

avatar
Mentor
You are looking for the ImportTSV utility offered by HBase, and its
bulk-load option. Read up more at
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#_importtsv to learn
how to use the ImportTSV to prepare bulk-loadable output, followed by
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#arch.bulk.load.complete
which
shows how to load the prepared output finally into HBase.

There's also a slightly dated example of the process on the Cloudera
Engineering blog which uses a CSV example with the above process:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

View solution in original post

7 REPLIES 7

avatar
Mentor
You are looking for the ImportTSV utility offered by HBase, and its
bulk-load option. Read up more at
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#_importtsv to learn
how to use the ImportTSV to prepare bulk-loadable output, followed by
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#arch.bulk.load.complete
which
shows how to load the prepared output finally into HBase.

There's also a slightly dated example of the process on the Cloudera
Engineering blog which uses a CSV example with the above process:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

avatar
Explorer

so i must to download the importtsv again ? or function importtsv had been there since i download and install hbase for my cluster ?

avatar
Mentor
The ImportTSV utility comes included with your CDH installation.

avatar
Explorer

i try to use command

"hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount word_count.csv"

 

and i get an error like this permission denied : user = xxxxx , access = WRITE, inode ="/user":hdfs:supergroup:drwxr-xr-x

 

what must i do ? .. can you help me ?

avatar
Explorer

i have make it !

i can upload csv now thanks a lot for your help !

i appreciate that 😄

avatar
Explorer

sorry for bother again..

i have upload it but when i try to upload it second time with same file..

my mapreduce result is success and the output file is exist.

but my database still empty , i don't know why becuase there is no error in the log.

i only change the name of the output folder, only that..

the first time, my output file is " output " and the next i change my output file to "output2",etc

do you know why ? 

i don't know why this happen.. thank you very much

avatar
New Contributor
Hallo, can you help me?
I have a problem when importing data into hbase table. I've tried to use importtsv, but the problem is the number of columns in my file very much (1000 columns). Do I have to write all the columns or is there another way that can automatically increase the number of columns according to the file?

Thankyou..