Support Questions

Find answers, ask questions, and share your expertise

ñ character in pig error

avatar
Master Collaborator

Hi:

I need that the 'ñ' caracter DOESNT be replaced in this code,

REPLACE($0,'([^a-zA-Z\\n\\.\\-]+)',''))

anny sugestion about how can do that?

thanks

1 ACCEPTED SOLUTION

avatar
Guru

In the ascii table n tilde is represented as octal 0361 which is represented in regex simply as \0361. So simply include [^\\0361] in your expression to prevent n tilde from being replaced by ' '.

This should work for you

REPLACE($0,'[^a-zA-Z\\0361\\n\\.\\-]+','')

See:

http://web.cs.mun.ca/~michael/c/ascii-table.html

http://www.regular-expressions.info/refcharacters.html

@Roberto Sancho -- I have corrected the code in this answer. The above works for me. If this is what you are looking for, please accept the answer, else please let me know remaining gaps.

View solution in original post

5 REPLIES 5

avatar
Guru

In the ascii table n tilde is represented as octal 0361 which is represented in regex simply as \0361. So simply include [^\\0361] in your expression to prevent n tilde from being replaced by ' '.

This should work for you

REPLACE($0,'[^a-zA-Z\\0361\\n\\.\\-]+','')

See:

http://web.cs.mun.ca/~michael/c/ascii-table.html

http://www.regular-expressions.info/refcharacters.html

@Roberto Sancho -- I have corrected the code in this answer. The above works for me. If this is what you are looking for, please accept the answer, else please let me know remaining gaps.

avatar
Master Collaborator

hi:

i tried this but still doesnt work, any other suggestion? many many thanks

REPLACE($0,'[^\\361a-zA-Z\\n\\.\\-]+)',''))
or
REPLACE($0,'([^a-zA-Z\\n\\.\\-\\361]+)',''))

avatar
Guru

@Roberto Sancho Could you share a few lines of the file you are using (including n tilde)? Tx

avatar
Guru

@Roberto Sancho I have it working. I have edited the original answer with the working code

Note the following

  • I changed \\361 to \\0361
  • simplified by removing () that wrap []

avatar
Master Collaborator

Hi:

i tried the new code, and its working now like that:

REPLACE($0,'[^a-zA-Z\\0361\\n\\.\\-]+','')

Many thanks.