Created 01-20-2016 12:22 AM
I use AvroStorage to store result set from the pig. Is there a way how can I store data into one specified avro file...e.g OutputFileGen1? Pig is storing data into the directory named OutpuFileGen1 with structure as listed below:
ls -al OutputFileGen1/
total 20
drwxr-xr-x 2 root root 4096 2016-01-18 14:35 .
drwxr-xr-x 6 root root 4096 2016-01-19 10:27 ..
-rw-r--r-- 1 root root 4083 2016-01-18 14:35 part-m-00000.avro
-rw-r--r-- 1 root root 40 2016-01-18 14:35 .part-m-00000.avro.crc
-rw-r--r-- 1 root root 0 2016-01-18 14:35 _SUCCESS
-rw-r--r-- 1 root root 8 2016-01-18 14:35 ._SUCCESS.crc
Thank you
http://stackoverflow.com/questions/34880880/avrostorage-output-file-name-definition
Created 01-20-2016 08:48 AM
Created 01-20-2016 12:29 AM
That option is available in Java mapreduce but in Pig it is not available. From the stackoverflow example, suggestion is to have a follow up hdfs command to rename the file tp desired name. Pig fully supports hdfs commands as part of scripts. @John Smith
Created 01-20-2016 08:16 AM
Hi, what do you mean by Java mapreduce? Directly into Java mapreduce code?
Im storing results into normal FS, can i use hdfs commands on files/directories stored on normal FS?
Created 01-20-2016 01:57 PM
correct, you can override the output with multipleoutputs and define path as you wish in Java. In Pig it is not possible, perhaps you'd like to open an enhancement Jira? To store results into normal FS, you need to launch the script in local mode or specify full path file:///path. Vice versa for Tez/MR mode, then you specify hdfs:// in local mode for hdfs FS.
Created 01-20-2016 08:48 AM
ok works on local FS also.
grunt> fs -getmerge dir file