Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

New Contributor

@pigroup pigroup @priya dharshini

Tried using PIG Latin but i am not able to parse it. Any LDIF parser present in the piggybank library? wrote regex but using that all my data is going in $0 column. Not able to filter the required information.

Please help. is pig the only way to parse it or there are other possiblities also. thank you.

4 REPLIES 4

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

Could you please help with more details on it like your code and usage?

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

New Contributor

commands that I am using are

A= LOAD '/local/home/kpi_dev/pig/Pig files to be parsed/LDAP/root.ldif_05240000' as (line:chararray);

B = foreach A generate REGEX_EXTRACT_ALL(‘$0’,'(.*):(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)');

and another that I used is

C = foreach A generate REGEX_EXTRACT_ALL('$0,'="ou=[^,]');

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

New Contributor

Please let me know if anything else is required. @Mukesh Kumar

Highlighted

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

Just check using below and modify accordingly as with my sample file i can see the data populate properly

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0, '(.*):(.*)=(.*),(.*)')) AS (id:chararray, name:chararray, nameid:chararray);