Support Questions

rushikeshdeshmu · ‎02-15-2016

Need to append the daily log file into HDFS, Please suggest on this?

kgopal · ‎02-15-2016

Hi Rushikesh,

You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.

Usage: hadoop fs -appendToFile <localsrc> ... <dst>

Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

View solution in original post

rushikeshdeshmu · ‎02-15-2016

I have received one answer for this:

Use flume in hadoop to retrieve the logs and sink in to hadoop (hdfs ,hbase). Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data. So you can use hbase instead of hdfs to append logs by using flume. Is there any other way available for doing this?

kgopal · ‎02-15-2016

Hi Rushikesh,

You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.

Usage: hadoop fs -appendToFile <localsrc> ... <dst>

Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

aervits · ‎02-15-2016

@Rushikesh Deshmukh there are a few options

1. you can create new files as they come and then rewrite them into larger files with Map Reduce or Pig

2. you can use hbase

3. you can take a look at hive streaming

4. you can try hdfs dfs -getmerge https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmer...

5. you can also take many smaller files and use Hadoop Archive (HAR) to create one large files

now unless you really mean append and not trying to avoid many smaller files, then options are not as many. In that case, I would probably look into 1. and 2.

rushikeshdeshmu · ‎02-20-2016

@Artem Ervits, thanks for sharing this information.

Cloudera Community

Support Questions

Append in HDFS?