Support Questions

Find answers, ask questions, and share your expertise

Hive support for Line Terminator other than '\n' when creating table

avatar
Explorer

I have XML data in one of DB fields which has '\n' in it. So, I need to use an alternate Line Terminator.

I get following error when I try using a different one.

FAILED: SemanticException 13:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token '';''

1 ACCEPTED SOLUTION

avatar

@Virendra Agarwal have you considered using XML Serde? It supports line breaks natively and probably it will be easier to parse your XML data.

See this from @Neeraj Sabharwal:

https://community.hortonworks.com/articles/972/hive-and-xml-pasring.html

View solution in original post

3 REPLIES 3

avatar

@Virendra Agarwal have you considered using XML Serde? It supports line breaks natively and probably it will be easier to parse your XML data.

See this from @Neeraj Sabharwal:

https://community.hortonworks.com/articles/972/hive-and-xml-pasring.html

avatar
@Virendra Agarwal

A custom SerDe will work. While you can use custom serde like one explained in this article from @Neeraj Sabharwal:

https://community.hortonworks.com/articles/972/hive-and-xml-pasring.html

Look at below stack overflow discussion as well:

http://stackoverflow.com/questions/27583736/lines-...

But if at all possible, it would be a great idea if you could migrate away from character separated files to a modern format like ORC or avro. You will gain in performance, benefit from complex structures and have a much more future proof data format. Load raw data into external table using SerDe but try to get the final resting place a managed table with an advanced file format, you'll be much happier in the long run.

avatar
New Contributor

@Pardeep

There is a design pattern being developed at our organization wherein one of the scenarios is to load data from an Oracle database on to Hive. The Hive table is defined as ORC format with ZLIB compression.

However, there is an error occurring when the Oracle source contains data with newline '\n' characters within text columns. Also, when we attempt to change 'LINES TERMINTED BY' to anything other than '\n' it is not allowed.

Could you please elaborate on your statement - "migrate away from character separated files to a modern format like ORC or avro" ?

Any assistance or pseudocode will be much appreciated.

I could provide details on the configuration in use, if required.