Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Creating a iterativa loop using Apache PIG

Solved Go to solution
Highlighted

Creating a iterativa loop using Apache PIG

Explorer

Hi experts,

I've the following code:

SPLIT A  INTO Src01 IF (Date=='2016-07-01'),
                Src02 IF (Date=='2016-07-02'),
                Src03 IF (Date=='2016-07-03'),
                Src04 IF (Date=='2016-07-04'),
                Src05 IF (Date=='2016-07-05'),
                Src06 IF (Date=='2016-07-06'),
                Src07 IF (Date=='2016-07-07'),
                Src08 IF (Date=='2016-07-08'),
                Src09 IF (Date=='2016-07-09'),
                Src10 IF (Date=='2016-07-10'),
                Src11 IF (Date=='2016-07-11'),
                Src07 IF (Date=='2016-07-12'),
                Src13 IF (Date=='2016-07-13'),
                Src14 IF (Date=='2016-07-14'),
                Src15 IF (Date=='2016-07-15'),
                Src16 IF (Date=='2016-07-16'),
                Src17 IF (Date=='2016-07-17'),
                Src18 IF (Date=='2016-07-18'),
                Src19 IF (Date=='2016-07-19'),
                Src20 IF (Date=='2016-07-20'),
                Src21 IF (Date=='2016-07-21'),
                Src22 IF (Date=='2016-07-22'),
                Src23 IF (Date=='2016-07-23'),
                Src24 IF (Date=='2016-07-24'),
                Src25 IF (Date=='2016-07-25'),
                Src26 IF (Date=='2016-07-26'),
                Src27 IF (Date=='2016-07-27'),
                Src28 IF (Date=='2016-07-28'),
                Src29 IF (Date=='2016-07-29'),
                Src30 IF (Date=='2016-07-30'),
                Src31 IF (Date=='2016-07-31'),
                Src011 IF (Date=='2016-06-01'); 
STORE Src01 INTO '/path/2016-07-01' using PigStorage('\t');
STORE Src02 INTO '/path/2016-07-02' using PigStorage('\t');
STORE Src03 INTO '/path/2016-07-03' using PigStorage('\t');
STORE Src04 INTO '/path/2016-07-04' using PigStorage('\t');
STORE Src05 INTO '/path/2016-07-05' using PigStorage('\t');
STORE Src06 INTO '/path/2016-07-06' using PigStorage('\t');
STORE Src07 INTO '/path/2016-07-07' using PigStorage('\t');
STORE Src08 INTO '/path/2016-07-08' using PigStorage('\t');
STORE Src09 INTO '/path/2016-07-09' using PigStorage('\t');
STORE Src10 INTO '/path/2016-07-10' using PigStorage('\t');
STORE Src11 INTO '/path/2016-07-11' using PigStorage('\t');
STORE Src07 INTO '/path/2016-07-12' using PigStorage('\t');
STORE Src13 INTO '/path/2016-07-13' using PigStorage('\t');
STORE Src14 INTO '/path/2016-07-14' using PigStorage('\t');              
STORE Src15 INTO '/path/2016-07-15' using PigStorage('\t');
STORE Src16 INTO '/path/2016-07-16' using PigStorage('\t');
STORE Src17 INTO '/path/2016-07-17' using PigStorage('\t');
STORE Src18 INTO '/path/2016-07-18' using PigStorage('\t');
STORE Src19 INTO '/path/2016-07-19' using PigStorage('\t');
STORE Src20 INTO '/path/2016-07-20' using PigStorage('\t');
STORE Src21 INTO '/path/2016-07-21' using PigStorage('\t');
STORE Src22 INTO '/path/2016-07-22' using PigStorage('\t');
STORE Src23 INTO '/path/2016-07-23' using PigStorage('\t');
STORE Src24 INTO '/path/2016-07-24' using PigStorage('\t');
STORE Src25 INTO '/path/2016-07-25' using PigStorage('\t');
STORE Src26 INTO '/path/2016-07-26' using PigStorage('\t');
STORE Src27 INTO '/path/2016-07-27' using PigStorage('\t');
STORE Src28 INTO '/path/2016-07-28' using PigStorage('\t');
STORE Src29 INTO '/path/2016-07-29' using PigStorage('\t');
STORE Src30 INTO '/path/2016-07-30' using PigStorage('\t');
STORE Src31 INTO '/path/2016-07-31' using PigStorage('\t');
STORE Src011 INTO '/path/2016-06-01' using PigStorage('\t');

There's a way that I can make this more automatically? Like using a loop or other iterative way?

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Creating a iterativa loop using Apache PIG

Better answer, check out my simple example of using MultiStorage at https://martin.atlassian.net/wiki/x/AgCHB and then assuming that your "Date" field in the original question was the first one in the record format of "a" then the following should get you taken care of.

STORE a INTO '/path'
  USING org.apache.pig.piggybank.storage.MultiStorage(
    '/path', '0', 'none', '\\t');

This would create folders like /path/2016-07-01 which themselves will have the 1+ "part files" for that given date. You could then use that directory location as your input path for another job. Good luck!!

6 REPLIES 6

Re: Creating a iterativa loop using Apache PIG

As you already know, Pig really isn't a general purpose programming that account for such things as this. The "Control Structures" page at http://pig.apache.org/docs/r0.15.0/cont.html gives your the project's recommendations on such things. Generally speaking, a custom script that fires off a generic Pig script, or maybe a Java program, might be your best friend. Good luck!

Re: Creating a iterativa loop using Apache PIG

Explorer

Thanks Lester :) So in your opinion is better do my transformation activities using Java? During my research I read that Apache pig is where we make the ETL process on Big Data projects. What type of jobs you recommends to do in Pig? Manu thanks! :)

Re: Creating a iterativa loop using Apache PIG

Sorry, no, I wasn't suggesting you abandon Pig just that you might need to wrap it with a script or a program to discretely call your generalized Pig script since Pig does not inherently have general purpose looping constructs like we do in other languages. That said, check out my NEW answer and related link which should be able to dynamically do what you want -- and in ONE line of Pig code!! Good luck!

Re: Creating a iterativa loop using Apache PIG

Better answer, check out my simple example of using MultiStorage at https://martin.atlassian.net/wiki/x/AgCHB and then assuming that your "Date" field in the original question was the first one in the record format of "a" then the following should get you taken care of.

STORE a INTO '/path'
  USING org.apache.pig.piggybank.storage.MultiStorage(
    '/path', '0', 'none', '\\t');

This would create folders like /path/2016-07-01 which themselves will have the 1+ "part files" for that given date. You could then use that directory location as your input path for another job. Good luck!!

Re: Creating a iterativa loop using Apache PIG

Explorer

Perfect Lester :) It's exactly what I need!!! :) Many thanks!!!

Re: Creating a iterativa loop using Apache PIG

@João Souza

Have you checked the answer by Lester below, based on Pig MultiStorage? That's exactly what you need.