Support Questions
Find answers, ask questions, and share your expertise

How can I rename Pig job output files(part files) on the fly

Highlighted

How can I rename Pig job output files(part files) on the fly

Rising Star

I wonder if I can store the output of pig jobs (the part files) with specific file name other than the default filename such as part-v000-o000-r-00000.deflate. For example when I execute: "store final_result INTO '/data/output' USING PigStorage(',');" , the output is stored on HDFS as /data/output/part-v000-o000-r-00000.deflate I want the output to look like is /data/output.csv or /data/output/output.csv

That is, to rename the "part-*" filename on the fly.

How can I achieve this in Pig?

3 REPLIES 3
Highlighted

Re: How can I rename Pig job output files(part files) on the fly

Hi @Kibrom Gebrehiwot,

Its not possible to name your output file name on the fly. Atleast as of now. But rather once the files are loaded into a directory use hadoop fs -getmerge target_file_directory to merge all the files to name it rather than naming each file.

Or you can read the file hadoop fs -cat filedirectory/* > file.txt and then copy it to HDFS.

Note: By following this approach you will be merging all the file into one single file.

Highlighted

Re: How can I rename Pig job output files(part files) on the fly

Rising Star

Thank you @Bala Vignesh N V for your answer.

Highlighted

Re: How can I rename Pig job output files(part files) on the fly

@Kibrom Gebrehiwot If it helps you then please accept the answer. Thanks! Happy Hadooping!