Created on 12-14-2018 10:12 AM - edited 09-16-2022 06:59 AM
Hi, I want to use flume to send text file to hdfs, I changed Configuration File in Flume service in Cloudera Manager as follows:
# Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = source1 tier1.channels = channel1 tier1.sinks = sink1 # For each source, channel, and sink, set # standard properties. # source details tier1.sources.source1.type = spooldir tier1.sources.source1.spoolDir = /data/diem tier1.sources.source1.fileHeader = false tier1.sources.source1.basenameHeader = true tier1.sources.source1.fileSuffix = .COMPLETED tier1.sources.source1.thread = 4 tier1.sources.source1.interceptors = newint tier1.sources.source1.interceptors.newint.type = timestamp tier1.sources.source1.channels = channel1 # channel details tier1.channels.channel1.type = file tier1.channels.channel1.capacity = 10000 tier1.channels.channel1.transactionCapacity = 10000 tier1.channels.channel1.write-timeout = 60 tier1.channels.channel1.checkpointDir = /data tier1.channels.channel1.dataDirs = /data # sink details tier1.sinks.sink1.type = HDFS tier1.sinks.sink1.fileType = DataStream tier1.sinks.sink1.channel = channel1 tier1.sinks.sink1.hdfs.path = hdfs://localhost:8020/user/cloudera/flume/events tier1.sinks.sink1.hdfs.writeFormat = Text tier1.sinks.sink1.hdfs.filePrefix = %{basename} tier1.sinks.sink1.threadsPoolSize = 4 tier1.sinks.sink1.hdfs.idleTimeout = 60 tier1.sinks.sink1.hdfs.batchSize = 100000
Then, I don't know how to start Flume in terminal to send file into HDFS, can someone help me? And can someone look at the configuration file and edit it for me if there are errors?
Created 12-14-2018 10:59 AM
Hi,
After saving the changes, you should have seen the icon to refresh cluster. Clicking this icon should do the steps to update the values. The configuration looks good.
Check the value of CM > Flume > configuration > Agent , this will tell whihc node the tier1 is configured to run on.
You can check the logs on that node to confirm if the sink1 got started or not. ( The logs are by default under /var/log/flume-nd). If you do not see the data in HDFS , please see the logs and you should see corresponding error message if ther is any issue in writting to hdfs.
Regards
Bimal
Created 12-16-2018 09:39 PM
Yeah, flume-ng, It was some typo I guess on previous comment. Please check if there are any ERROR or suspicious messages
Additionally, could you please check if your source spool directory is getting content to pass to flume
Created 12-14-2018 10:59 AM
Hi,
After saving the changes, you should have seen the icon to refresh cluster. Clicking this icon should do the steps to update the values. The configuration looks good.
Check the value of CM > Flume > configuration > Agent , this will tell whihc node the tier1 is configured to run on.
You can check the logs on that node to confirm if the sink1 got started or not. ( The logs are by default under /var/log/flume-nd). If you do not see the data in HDFS , please see the logs and you should see corresponding error message if ther is any issue in writting to hdfs.
Regards
Bimal
Created 12-14-2018 11:45 AM
You mean log in file flume.log in folder flume-ng? Because I don't see the flume-nd
Created 12-16-2018 09:39 PM
Yeah, flume-ng, It was some typo I guess on previous comment. Please check if there are any ERROR or suspicious messages
Additionally, could you please check if your source spool directory is getting content to pass to flume
Created 12-17-2018 07:40 AM
Thank you very much, I solved my problem
Created 12-17-2018 09:16 AM
I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Created 12-18-2018 12:49 AM
Yeah, I did, tks 😄