Support Questions
Find answers, ask questions, and share your expertise

Append in HDFS?

Need to append the daily log file into HDFS, Please suggest on this?

1 ACCEPTED SOLUTION

Contributor

Hi Rushikesh,

You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.

Usage: hadoop fs -appendToFile <localsrc> ... <dst>

Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

  • hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
  • hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
  • hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
  • hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

View solution in original post

4 REPLIES 4

I have received one answer for this:

Use flume in hadoop to retrieve the logs and sink in to hadoop (hdfs ,hbase). Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data. So you can use hbase instead of hdfs to append logs by using flume. Is there any other way available for doing this?

Contributor

Hi Rushikesh,

You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.

Usage: hadoop fs -appendToFile <localsrc> ... <dst>

Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

  • hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
  • hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
  • hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
  • hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

Mentor

@Rushikesh Deshmukh there are a few options

1. you can create new files as they come and then rewrite them into larger files with Map Reduce or Pig

2. you can use hbase

3. you can take a look at hive streaming

4. you can try hdfs dfs -getmerge https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmer...

5. you can also take many smaller files and use Hadoop Archive (HAR) to create one large files

now unless you really mean append and not trying to avoid many smaller files, then options are not as many. In that case, I would probably look into 1. and 2.

@Artem Ervits, thanks for sharing this information.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.