Created 02-15-2016 09:13 AM
Need to append the daily log file into HDFS, Please suggest on this?
Created 02-15-2016 09:49 AM
Hi Rushikesh,
You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.
Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.
Exit Code:
Returns 0 on success and 1 on error.
Created 02-15-2016 09:47 AM
I have received one answer for this:
Use flume in hadoop to retrieve the logs and sink in to hadoop (hdfs ,hbase). Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data. So you can use hbase instead of hdfs to append logs by using flume. Is there any other way available for doing this?
Created 02-15-2016 09:49 AM
Hi Rushikesh,
You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.
Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.
Exit Code:
Returns 0 on success and 1 on error.
Created 02-15-2016 02:12 PM
@Rushikesh Deshmukh there are a few options
1. you can create new files as they come and then rewrite them into larger files with Map Reduce or Pig
2. you can use hbase
3. you can take a look at hive streaming
4. you can try hdfs dfs -getmerge https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmer...
5. you can also take many smaller files and use Hadoop Archive (HAR) to create one large files
now unless you really mean append and not trying to avoid many smaller files, then options are not as many. In that case, I would probably look into 1. and 2.
Created 02-20-2016 01:44 PM
@Artem Ervits, thanks for sharing this information.