Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

PIG: Loading a file with embedded newlines in the fields.

PIG: Loading a file with embedded newlines in the fields.

Hi,

I have a file with embedded newlines in the fields (and \u0001 as field delimiter, and \u0002 as row delimiter). How can I read this file so PIG ignores the newline with either PigStorage or TextLoader?

Thx! /W

3 REPLIES 3

Re: PIG: Loading a file with embedded newlines in the fields.

Guru

Hi Ward,

Try this at the top of your pig script:

SET textinputformat.record.delimiter '\u0002';

or

SET textinputformat.record.delimiter '\0002';

Re: PIG: Loading a file with embedded newlines in the fields.

Hi Greg, I did, but it doesn't seem to ignore the newlines on load.

	SET textinputformat.record.delimiter '\u0002';
	data = load '/test/records_1000' USING PigStorage('\u0001');

Re: PIG: Loading a file with embedded newlines in the fields.

Guru

Hmm. Did it behave similarly with SET textinputformat.record.delimiter '\0002';

Could you post a few records?