Support Questions
Find answers, ask questions, and share your expertise

Exporting a phoenix table into a csvfile

Is there a way to export a phoenix table into a csv file using the phoenix pherf ulitly. If so can anyone provide the details. I couldnt get the proper documetnation for this. we are supposed to export around 26 million rows into csv file.

11 REPLIES 11

Phoenix pherf utility exports details like how much time the query took etc..not the data. There is no way to export data into csv from Phoenix. Can't you use Export utility provided by HBase.

Better to go with @Ankit Singhal suggestion of using pig/hive or spark to export.

hi @Rajeshbabu Chintaguntla, the export utility provided by hbase generates sequence files i guess. how to convert them into csv files.

I tried the pig export, but the datatypes are not mapped.

i used a query similar to this

A = load 'hbase://query/select * from TRANSACTION' using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');

Should i need to explicitly specify the schema and if so how to do that.

Yes Export generate sequence files. If you want to import back the exported data to other HBase table in different cluster then you can use it other wise it won't help.

Pig should map the data types properly. Can you try specifying the columns list than * and check. For eg:

A = load 'hbase://query/select col1,col2.. from TRANSACTION' using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');

@ARUN

Or else can you try something similar to type convertion.

A = load 'hbase://query/select col1,col2.. from TRANSACTION' using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost') as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);

Depending on the size of the data you want to export, you can just run a normal query.

SELECT col1 || ',' || col2 || ',' || col3 from my_table;

Explorer

@ARUN did you get it, how to export a phoenix table into a csv file?

Explorer

@ARUN did you get it, how to export a phoenix table into a csv file?

Contributor

This thread is old but wanted to throw in my $.02. You can do this by creating a Hive external table over the Phoenix table using the Phoenix Storage Handler for Hive then export the data from that Hive table right into HDFS.

https://phoenix.apache.org/hive_storage_handler.html

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataint...

Contributor

The Phoenix-Hive storage handler as of v4.14.0 (CDH 5.12) seems buggy.  I was able to get the Hive external wrapper table working for simple queries, after tweaking column mapping around upper/lower case gotchas.  However, it fails to work when I tried the "INSERT OVERWRITE DIRECTORY ... SELECT ..." command to export to file:

org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=<table name>

 

This is a known problem that no one is apparently looking at:

https://issues.apache.org/jira/browse/PHOENIX-4804

 

; ;