Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Rising Star

One of the first cases we get to see with Hbase is loading it up with Data, most of the time we will have some sort of data in some format like CSV availalble and we would like to load it in Hbase, lets take a quick look on how does the procedure looks like:

lets examine our example data by looking at the simple structure that I have got for an industrial sensor

 id, temp:in,temp:out,vibration,pressure:in,pressure:out
 5842,  50,     30,       4,      240,         340

First of all make sure Hbase is started on your Sandbox as following

Creating the HBase Table

  • Login as Root to the HDP Sandbox and and switch to the Hbase User
root> su - hbase
  • Go to the Hbase Shell by typing
hbase> hbase shell
  • Create the example table by typing
hbase(main):001:0> create 'sensor','temp','vibration','pressure'
  • lets make sure the table was created and examine the structure by typing
hbase(main):001:0> list
  • now, exit the shell by typing 'exit' and lets load some data

Loading the Data

  • lets put the hbase.csv file in HDFS, you may SCP it first to the cluster by using the following command
macbook-ned> scp hbase.csv root@sandbox.hortonworks.com:/home/hbase
  • now put in HDFS using the following command
hbase> hadoop dfs -copyFromLocal hbase.csv /tmp
  • we shall now execute the Loadtsv statement as following
hbase> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=,  -Dimporttsv.columns="HBASE_ROW_KEY,id,temp:in,temp:out,vibration,pressure:in,pressure:out" sensor hdfs://sandbox.hortonworks.com:/tmp/hbase.csv
  • once the mapreduce job is completed, return back to hbase shell and execute
hbase(main):001:0> scan sensor
  • you should now see the data in the table

Remarks

  • Importtsv statement generates massive amount of logs, so make sure you have enough space in /var/logs, its always better to have it mounted on a seperate directories in real cluster to avoid operational stop becuase of logs filling the partition.
51,582 Views
Comments
avatar
Explorer

Hi,

I am using Apache Hbase (Version 1.1.3)

I used the same importtsv syntax to import the same data (header removed) - got this error -

syntax error, unexpected ','

then removed the 'ID' column in '-Dimporttsv.columns'

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator= ',' -Dimporttsv.columns="HBASE_ROW_KEY,temp:in,temp:out,vibration,pressure:in,pressure:out" sensor /user/hbase.csv

getting

syntax error, unexpected tIDENTIFIER

Please help

Thanks,

Sridharan

avatar
New Contributor
Hi Ned,
I have this following csv file with me- userId,prodId,rating,Date:M,Date:D,Date:Y,Help:a,Help:b,Review:a,Review:b AO94DHGC771SJ,528881469,5,6,2,2013,0,0,We got this GPS for my husband who is an (OTR) ove,, AMO214LNFCEI4,528881469,1,11,25,2010,12,15,I'm a professional OTR truck driver and I bought ,, A3N7T0DY83Y4IG,528881469,3,9,9,2010,43,45,Well what can I say. I've had this unit in my tr,, I created a table in hbase with following query - create 'Producttest1','userId','prodId','rating','Date','Help','Review' when I am trying to import csv to hbase using query gives the following error-

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,userId,prodId,rating,Date:M,Date:D,Date,Y,Help:a,Help:b,Review:a,Review:b" Producttest1 hdfs://localhost:50070:/mayur/ProductReview/InputFiles/ElectronicsShortTemp.csv SyntaxError: (hbase):19: syntax error, unexpected ','

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,userId,prodId,rating,Date:M,Date:D,Date,Y,Help:a,Help:b,Review:a,Review:b" Producttest1 hdfs://localhost:50070:/mayur/ProductReview/InputFiles/ElectronicsShortTemp.csv

Kindly help me for this. Thanks.
avatar
Rising Star

@Ned Shawa I tried to follow the example above to import a csv file named drivers.

hbase(main):001:0> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, –Dimporttsv.columns=”HBASE_ROW_KEY,driver_id,driver_name,certified,wage_plan” drivers /home/bilal/drivers.csv
SyntaxError: (hbase):1: syntax error, unexpected ','
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, 
                                                                        ^

I am getting the following SyntaxError unexpected ','.

Would you be kind enough to suggest the solution?

Thanking you in anticipation.

avatar
Contributor

I am unable to load my data in HBase with this command, I get the following error

SyntaxError: (hbase):2: syntax error, unexpected tIDENTIFIER

Mine is a fully distributed cluster of Hadoop 2.7.3 and HBase 1.2.5. I have also tried removing the separator argument and loading a TSV file (the ',' given in the above line as the value of the argument separator gives an error anyway). It has probably got something to do with the way the tables are referenced in HBase 1.2.5. Please respond.

avatar
Cloudera Employee

This is not an hbase shell command, we just need to run as a command from Unix (or Windows) shell

 

/usr/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','  -Dimporttsv.columns="HBASE_ROW_KEY,value" spark-defaults hdfs:///tmp/spark-defaults.prop