Created 05-27-2017 02:23 AM
Hi Friends,
I have question on Pig script. I have to load data from two different HDFS paths into single Pig relation.
Ex: /data/input1.csv and another file is in /inputdata/input1 or input2.csv.
Is it possible to load these two tables in to single Pig relation?
Thanks,
Satish.
Created 05-27-2017 04:02 AM
If these two sources have the same schema it is a simple manner of using the UNION operator to do these three steps:
Source_1 = LOAD "/data/input1.csv" USING PigStorage(',') ... Source_2 = LOAD "/data/input2.csv" USING PigStorage(',') ... Source = UNION Source_1, Source_2;
See these references for elaboration:
Created 05-27-2017 04:02 AM
If these two sources have the same schema it is a simple manner of using the UNION operator to do these three steps:
Source_1 = LOAD "/data/input1.csv" USING PigStorage(',') ... Source_2 = LOAD "/data/input2.csv" USING PigStorage(',') ... Source = UNION Source_1, Source_2;
See these references for elaboration:
Created 05-27-2017 09:09 AM
Thanks Greg, but is there anyway to load both files from different path into single relation using LOAD?
Created 05-27-2017 11:31 AM
@Satish Sarapuri Yes, you can GLOB the filename pattern. This will work work:
Source = LOAD '/data/input{1,2}.csv' USING PigStorage(,)...
You can use other GLOB patterns. See https://books.google.com/books?id=Nff49D7vnJcC&pg=PA60&lpg=PA60&dq=hdfs+glob&source=bl&ots=IjkvXt9zU...
Created 05-27-2017 02:50 PM
@Grey Keys, both source data is in different paths.
Created 05-30-2017 12:54 PM
@Satish Sarapuri You can use globs anywhere in the path (not just the filename). There are quite many operators for globs (similar to linux) as shown in the above link, so if there is enough in common with the paths you should be able to leverage globs for the differing parts. If none of that works, you could still use the globs with full paths:
Source = LOAD '/{path1,path2}' USING PigStorage(,)... where path1 and path2 can be any file path.
Created 04-10-2019 06:24 PM
Hi, I am a new one of Big Data. This code is like Union? So mean, @Greg Keys you write two codes. They are working same? Thank you for answering...
Created 06-04-2017 10:15 PM
Both source files are in different paths.