Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

STRSPLIT pig

avatar
Master Collaborator

HI:

its posible to split in pig www.amazon.es? iam doing this but doest work:

orders4 = FOREACH orders3 GENERATE $0 as freq, STRSPLIT($1,'.') as word;

its just print thr $0. any suggestions??

1 ACCEPTED SOLUTION

avatar
Master Mentor
6 REPLIES 6

avatar
Master Mentor

@Roberto Sancho can you clarify, are you trying to split url "www.amazon.es" by "."?

avatar
Master Collaborator

Hi:

yes with '.' doesnt work, but its work with '-'

avatar
Master Mentor
@Roberto Sancho

take a look at this example, it's not exactly what you want but shows a working example.

https://community.hortonworks.com/questions/12302/trying-to-read-a-nested-parquet-file-using-pig-not...

avatar
Master Mentor

@Roberto Sancho since this is a regex expression, '.' is taken as regex instead of a separator. It won't work, I just tried with # as my example in the link and it also worked but with dot it doesn't.

avatar
Master Mentor

@Roberto Sancho I got it, I followed advice from http://stackoverflow.com/questions/24981431/strsplit-in-pig-functions

grunt> a = load 'test2' using PigStorage() as (str:chararray);
grunt> b = foreach a generate STRSPLIT($0,'\\u002E') as word;
grunt> dump b;

((www,amazon,es))

avatar
Master Collaborator

Hi:

the last solution was fine, but also work this:

orders4 = FOREACH orders3 GENERATE $0 as freq, (chararray) ((word  matches '.*..*') ? SUBSTRING(word,INDEXOF(word,'.',0)+1,LAST_INDEX_OF(word,'.')) : $1) as word;