I use AvroStorage to store result set from the pig. Is there a way how can I store data into one specified avro file...e.g OutputFileGen1? Pig is storing data into the directory named OutpuFileGen1 with structure as listed below:
ls -al OutputFileGen1/ total 20 drwxr-xr-x 2 root root 4096 2016-01-18 14:35 . drwxr-xr-x 6 root root 4096 2016-01-19 10:27 .. -rw-r--r-- 1 root root 4083 2016-01-18 14:35 part-m-00000.avro -rw-r--r-- 1 root root 40 2016-01-18 14:35 .part-m-00000.avro.crc -rw-r--r-- 1 root root 0 2016-01-18 14:35 _SUCCESS -rw-r--r-- 1 root root 8 2016-01-18 14:35 ._SUCCESS.crc
That option is available in Java mapreduce but in Pig it is not available. From the stackoverflow example, suggestion is to have a follow up hdfs command to rename the file tp desired name. Pig fully supports hdfs commands as part of scripts. @John Smith
Hi, what do you mean by Java mapreduce? Directly into Java mapreduce code?
Im storing results into normal FS, can i use hdfs commands on files/directories stored on normal FS?
correct, you can override the output with multipleoutputs and define path as you wish in Java. In Pig it is not possible, perhaps you'd like to open an enhancement Jira? To store results into normal FS, you need to launch the script in local mode or specify full path file:///path. Vice versa for Tez/MR mode, then you specify hdfs:// in local mode for hdfs FS.