- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Append in HDFS?
- Labels:
-
Apache Hadoop
Created ‎02-15-2016 09:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need to append the daily log file into HDFS, Please suggest on this?
Created ‎02-15-2016 09:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.
Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.
- hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
- hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
- hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
- hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:
Returns 0 on success and 1 on error.
Created ‎02-15-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have received one answer for this:
Use flume in hadoop to retrieve the logs and sink in to hadoop (hdfs ,hbase). Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data. So you can use hbase instead of hdfs to append logs by using flume. Is there any other way available for doing this?
Created ‎02-15-2016 09:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
You can create a daily script and use the following option of hadoop fs command to append the an existing file on HDFS.
Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.
- hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
- hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
- hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
- hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:
Returns 0 on success and 1 on error.
Created ‎02-15-2016 02:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Rushikesh Deshmukh there are a few options
1. you can create new files as they come and then rewrite them into larger files with Map Reduce or Pig
2. you can use hbase
3. you can take a look at hive streaming
4. you can try hdfs dfs -getmerge https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmer...
5. you can also take many smaller files and use Hadoop Archive (HAR) to create one large files
now unless you really mean append and not trying to avoid many smaller files, then options are not as many. In that case, I would probably look into 1. and 2.
Created ‎02-20-2016 01:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artem Ervits, thanks for sharing this information.
