Support Questions
Find answers, ask questions, and share your expertise

I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

@pigroup pigroup @priya dharshini

Tried using PIG Latin but i am not able to parse it. Any LDIF parser present in the piggybank library? wrote regex but using that all my data is going in $0 column. Not able to filter the required information.

Please help. is pig the only way to parse it or there are other possiblities also. thank you.

4 REPLIES 4

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

Could you please help with more details on it like your code and usage?

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

commands that I am using are

A= LOAD '/local/home/kpi_dev/pig/Pig files to be parsed/LDAP/root.ldif_05240000' as (line:chararray);

B = foreach A generate REGEX_EXTRACT_ALL(‘$0’,'(.*):(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)');

and another that I used is

C = foreach A generate REGEX_EXTRACT_ALL('$0,'="ou=[^,]');

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

Please let me know if anything else is required. @Mukesh Kumar

Re: I am struggling to parse the ldif file in pig. Problem with java is it gives me heap memory issue. Increasing the heap size dont work.

Just check using below and modify accordingly as with my sample file i can see the data populate properly

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0, '(.*):(.*)=(.*),(.*)')) AS (id:chararray, name:chararray, nameid:chararray);