I have hive tables with various special characters in the field names, such as - / # etc...
Is there a way to properly escape them when loading into a pig script? I'm using HCatloader to load from the tables
example: e-mail_address (it thinks that e is a field and mail_address is a field)
Pig doesn't like when I enclose the field in the backtick `, i tried enclosing with one and two backticks `field` ``field``
Syntax error, unexpected symbol at or near '`e-mail_address_smtp_addr`'
One way to get around the Pig issue is to refer to columns by the position. So if e-mail_address_smtp_addr was field number 3, you can use $3. It is not ideal, but it may be a viable workaround.
Unfortunately field names in pig cannot have special characters and cannot be escaped.
@Michael Young has the workaround. It could be extended in the following way:
If you really want to assign aliases (comes in handy during all of those joins, filters, etc later in the script) you can just assign them in a FOREACH .. GENERATE statement.
For example let's use Michael's example above where the 4th field has the name with bad character. Let' say there are a total of 10 fields (all other's are named with good characters).
You could do:
data = load 'escapetest.txt' USING org.apache.HCatalog.pig.HCatLoader(); x = FOREACH data GENERATE $0..$3, $4 as email_address, $5..;