- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 07-06-2016 10:28 PM - edited 08-17-2019 11:34 AM
This tutorial will show how to export data out of hbase table into csv format. We will use airport data from american statical association available here. Assume you have a sandbox up and running lets start.
First ssh into your sandbox and switch user to hdfs
- sudo su - hdfs
Then grab the airport data by issues a wget
For my example the file is located
- /home/hdfs/airports.csv
Now lets create a hbase table called "airports" with column family "info". Do this in hbase shell
Now that the table is created lets load it. Get out of hbase shell. as user hdfs run the following to load the table
- hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,info:iata,info:airport,info:city,info:country,info:lat,info:long" airports hdfs://sandbox.hortonworks.com:/tmp/airports.csv
That will kick off map reduce job to load airport table in hbase. once that is done you can do a quick verify in hbase shell by running
- counts 'airports'
You should see 3368 records in the table. Now lets log into pig shell.
We will create a variable called airport_data which we will load our hbase table into by issuing:
- airport_data = LOAD 'hbase://airports' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:iata,info:airport,info:city,info:country,info:lat,info:long', '-loadKey true') AS (iata,airport,city,country,lat,long);
Now that we have our data in a variable lets dump it to hdfs using csv format by issuing:
- store airport_data into 'airportData/export' using PigStorage(',');
So we have dumped the export into hdfs directory airportData/export. Lets go view it
And there you go. We have loaded data into hbase table. Exported data from the table using pig in csv format. Happy pigging.