Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop Export from Hive table - specifying Delimiters

Expert Contributor

I have a hive table thats Gzip compresed.

Table Creation

CREATE TABLE table_text (id int, xml string)
PARTITIONED BY (year string, month string)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
TBLPROPERTIES ('textinputformat.record.delimiter'='#');

The xml column has xmls data which is pretty printed. Thats why different line delimiter is chosen.

Data Load from another table

SET hive.exec.compress.output=true;

INSERT OVERWRITE table table_text partition(year=2016,month=2)
select id, decompress(xml) from table_avro;

Validating the load

Select count(*) from table_text;

This gives one record, which is fine as the source table table_avro also has one record.

Sqoop export

I want to export this gzip compressed data from this hive table to mysql table.

sqoop export --connect jdbc:mysql://localhost/test --driver com.mysql.jdbc.Driver --username hive --password hive --table  mysql_table_export --hcatalog-table table_text --input-fields-terminated-by '|' --input-lines-terminated-by '#'

This exports 14000 odd rows to the mysql table. In fact the xml has 14000 odd lines.

How do i specify the input data line and field separators so that the data is read correctly and exported correctly? Should the output delimiters also need to be specified?



@Manikandan Kannan

appears that to use -hcatalog option in sqoop, the table needs to be first created with hcat option... eg: hcat -e "create table....

in addition: below needs to provided as well as per docs:

hcatalog-databaseSpecifies the database name for the HCatalog table. If not specified, the default database name ‘default’ is used. Providing the --hcatalog-database option without --hcatalog-table is an error. This is not a required option.

kindly let me know if this works.... else you can switch to "--export-dir <directory>.*" option.... ( " * " for exporting partitions recursively )

Expert Contributor

@Mayank Shekhar Its not about the target table not found. It exports but exports with no knowledge of the input field and the record delimiters.


@Manikandan Kannan

redefining the table with hcat option and then sqooping using hcatalog feature didnt work as well ? How about switching back to '--export-dir <directory>.*' option ?

Expert Contributor

@Mayank Shekhar, hcat option did not make any different which i think is expected as creating an hive table is same as create a table in hcat. Even the --export-dir did not work. Below is the command used,

sqoop export --connect jdbc:mysql://localhost/test --driver com.mysql.jdbc.Driver --username hive --password hive --table mysql_table_export --export-dir /apps/hive/warehouse/table_text/**/* --input-fields-terminated-by '|' --input-lines-terminated-by '#' --fields-terminated-by '|' --lines-terminated-by '#'

I get an exception in this case,

Error: Can't export data, please check failed map task logs at at at at at org.apache.hadoop.mapred.MapTask.runNewMapper( at at org.apache.hadoop.mapred.YarnChild$ at Method) at at at org.apache.hadoop.mapred.YarnChild.main(

Caused by: java.lang.RuntimeException: Can't parse input data: ' </child></parent>' at noblob_table_avro_export_1.__loadFromFields( at noblob_table_avro_export_1.parse( at ... 10 more

Caused by: java.util.NoSuchElementException at java.util.ArrayList$ at noblob_table_avro_export_1.__loadFromFields( ... 12 more

The hive table has only one record as follows


It definitely looks like that sqoop is not able to read with record delimiter as '#' although the sqoop generated java class has input delimiters as (char) 124 and (char) 35.

In the generated class,

private static final DelimiterSet __inputDelimiters = new DelimiterSet((char) 124, (char) 35, (char) 0, (char) 0, false);

Expert Contributor

I just provided an XML sample which looks like similar to the one that i use as i cannot share. The difference is that the XML that i use has the namespaces defined for the xml tags.