Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Load fields with special characters from Hive table into pig script using HCatLoader

Highlighted

Load fields with special characters from Hive table into pig script using HCatLoader

Contributor

I have hive tables with various special characters in the field names, such as - / # etc...

Is there a way to properly escape them when loading into a pig script? I'm using HCatloader to load from the tables

example: e-mail_address (it thinks that e is a field and mail_address is a field)

4 REPLIES 4
Highlighted

Re: Load fields with special characters from Hive table into pig script using HCatLoader

@Josh Persinger

Yes, you should be able to. See the following link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-013-feature...

Summary: Use the backtick character ``
Highlighted

Re: Load fields with special characters from Hive table into pig script using HCatLoader

Contributor

@Michael Young

Pig doesn't like when I enclose the field in the backtick `, i tried enclosing with one and two backticks `field` ``field``

Syntax error, unexpected symbol at or near '`e-mail_address_smtp_addr`'
Highlighted

Re: Load fields with special characters from Hive table into pig script using HCatLoader

@Josh Persinger

One way to get around the Pig issue is to refer to columns by the position. So if e-mail_address_smtp_addr was field number 3, you can use $3. It is not ideal, but it may be a viable workaround.

Re: Load fields with special characters from Hive table into pig script using HCatLoader

Guru
@Josh Persinger

Unfortunately field names in pig cannot have special characters and cannot be escaped.

@Michael Young has the workaround. It could be extended in the following way:

If you really want to assign aliases (comes in handy during all of those joins, filters, etc later in the script) you can just assign them in a FOREACH .. GENERATE statement.

For example let's use Michael's example above where the 4th field has the name with bad character. Let' say there are a total of 10 fields (all other's are named with good characters).

You could do:

data = load 'escapetest.txt' USING org.apache.HCatalog.pig.HCatLoader();
x = FOREACH data GENERATE $0..$3, $4 as email_address, $5..;
Don't have an account?
Coming from Hortonworks? Activate your account here