Support Questions

Josh2023 · ‎05-18-2023

Hello,

I am trying to run a mapreduce job and output it on a folder. Is there any way to change the default filename. My default filename is part-r-00000.snappy.parquet and I want to add a date to this. Is this possible.

Is there a parameter to change this, thank you.

VidyaSargur · ‎05-18-2023

@Josh2023, Welcome to our community! To help you get the best possible answer, I have tagged our MapR experts @asish @mugdha who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

asish · ‎05-18-2023

@Josh2023 can you try below:

SET hive.exec.compress.output=false; SET hive.exec.dynamic.partition.mode=nonstrict; -- Create a temporary table or select data from existing tables INSERT OVERWRITE DIRECTORY '/path/to/output_directory/filename_' || from_unixtime(unix_timestamp(), 'yyyyMMdd') || '.csv' SELECT * FROM your_table;

asish · ‎05-19-2023

@Josh2023 could you please accept as solution,if this has fixed your issue?

Josh2023 · ‎05-19-2023

Hell @asish, thanks for the answer, unfortunately my output is already parquet format,

Example below is the usually output of my mapreduce job:

part-r-00000.snappy.parquet

I need to use below format as an example:

part-r-00000.2023-05-19-04-09.snappy.parquet

Is this possible?

asish · ‎05-19-2023

this is possible,please try parquet,instead of csv

DianaTorres · ‎05-29-2023

@Josh2023 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

ggangadharan · ‎06-06-2023

Since the output file is .parquet , hope you're using ParquetOutputFormat in the MR job config. In that case ParquetOutputFormat.setOutputname method will help to set the base name of the output file.

Ref -
https://www.javadoc.io/doc/org.apache.parquet/parquet-hadoop/1.12.2/org/apache/parquet/hadoop/Parque...
https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.ht...)

ggangadharan · ‎06-06-2023

mapreduce.output.basename also works since as part of setoutput name assigning the same.

Code snippet form ParquetOutputFormat.

    protected static void setOutputName(JobContext job, String name) {
        job.getConfiguration().set("mapreduce.output.basename", name);
    }

JOB CONF -

 Configuration conf = getConf();
 conf.set("mapreduce.output.basename","parquet_output");

Output

[hive@c1757-node3 ~]$ hdfs dfs -ls /tmp/parquet-sample
Found 4 items
-rw-r--r--   2 hive supergroup          0 2023-06-06 17:08 /tmp/parquet-sample/_SUCCESS
-rw-r--r--   2 hive supergroup        271 2023-06-06 17:08 /tmp/parquet-sample/_common_metadata
-rw-r--r--   2 hive supergroup       1791 2023-06-06 17:08 /tmp/parquet-sample/_metadata
-rw-r--r--   2 hive supergroup       2508 2023-06-06 17:08 /tmp/parquet-sample/parquet_output-m-00000.parquet

DianaTorres · ‎06-09-2023

@Josh2023 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Change default output filename part-r-00000.snappy.parquet