Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Master Guru

This tutorial will show how to export data out of hbase table into csv format. We will use airport data from american statical association available here. Assume you have a sandbox up and running lets start.

First ssh into your sandbox and switch user to hdfs

  • sudo su - hdfs

Then grab the airport data by issues a wget

For my example the file is located

  • /home/hdfs/airports.csv

Now lets create a hbase table called "airports" with column family "info". Do this in hbase shell

5569-2016-07-06-15-39-28.jpg

Now that the table is created lets load it. Get out of hbase shell. as user hdfs run the following to load the table

  • hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,info:iata,info:airport,info:city,info:country,info:lat,info:long" airports hdfs://sandbox.hortonworks.com:/tmp/airports.csv

That will kick off map reduce job to load airport table in hbase. once that is done you can do a quick verify in hbase shell by running

  • counts 'airports'

You should see 3368 records in the table. Now lets log into pig shell.

We will create a variable called airport_data which we will load our hbase table into by issuing:

  • airport_data = LOAD 'hbase://airports' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:iata,info:airport,info:city,info:country,info:lat,info:long', '-loadKey true') AS (iata,airport,city,country,lat,long);

Now that we have our data in a variable lets dump it to hdfs using csv format by issuing:

  • store airport_data into 'airportData/export' using PigStorage(',');

5571-2016-07-06-15-51-57.jpg

So we have dumped the export into hdfs directory airportData/export. Lets go view it

5572-2016-07-06-15-55-47.jpg

And there you go. We have loaded data into hbase table. Exported data from the table using pig in csv format. Happy pigging.


2016-07-06-15-47-50.jpg2016-07-06-15-35-47.jpg
4,705 Views