Support Questions
Find answers, ask questions, and share your expertise

Data ingested by Flume converted into Hexdump

Highlighted

Data ingested by Flume converted into Hexdump

Explorer

Hi, 

 

I am trying to pull a log file into hdfs using flume, and while the file seems to be ingesting ok, it being converted into a hexdump within the DFS.  I am trying to set this up so it can be ingested as plain test.    Here is my config file

 

# primary components
spool.sources = source1
spool.channels = channel1
spool.sinks = sink1

# source config
spool.sources.source1.type = spooldir
spool.sources.source1.spoolDir =/home/ubuntu/spool_test_dir
spool.sources.source1.channels = channel1
spool.sources.source1.interceptors = time_stamp_interceptor

# Interceptor Configuration
spool.sources.source1.interceptors.time_stamp_interceptor.type = timestamp
spool.sources.source1.interceptors.time_stamp_interceptor.preserveExisting = true



# channel config
spool.channels.channel1.type = memory
spool.channels.channel1.capacity = 50  # the number of events the channel can handle
spool.channels.channel1.keep-alive = 10
spool.channels.channel1.write-timeout = 30



# sink config
spool.sinks.sink1.type=hdfs
spool.sinks.sink1.hdfs.path= hdfs://xx.xx.xxx.xxx/user/vu/data/flume_test/spool_test/%Y/%m/%d/
spool.sinks.sink1.hdfs.rollSize = 10240
spool.sinks.sink1.hdfs.rollInterval = 0
spool.sinks.sink1.hdfs.rollCount = 0  
spool.sinks.sink1.hdfs.filetype = DataStream
spool.sinks.sink1.hdfs.writeFormat = Writable  
spool.sinks.sink1.channel = channel1

 

 

I have tryed all the combinations of the hdfs filetype and writeFormat described in the documentation with no success. Does anyone have any ideas on what I need to change to get this to write to HDFS as plain text? 

 

Thanks  

 

 

The data coming in looks like this in the log file

 

Feb 14 06:02:35 Client 1 dhclient: DHCPACK of xx.xxx.xx.xxx from xx.xx.xx.x
Feb 14 06:02:35 Client 1 dhclient: bound to xx.xx.xxx.xx -- renewal in 1427 seconds.
Feb 14 06:17:01 Client 1 CRON[25979]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Feb 14 06:25:02 Client 1 CRON[26426]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Feb 14 06:26:22 Client 1dhclient: DHCPREQUEST of xx.xx.xxx.xxx on eth0 to xx.xx.xx.x port 67
Feb 14 06:26:22 Client 1  dhclient: DHCPACK of xx.xx.xxx.xxx from x.x.xx.x
Feb 14 06:26:22 Client 1  dhclient: bound to xx.xx.xxx.x -- renewal in 1697 seconds.

 

and looks like this in the with in the HDFS chunk(as viewed through httphdfs)

 

000e90: 32 30 31 34 2d 30 32 2d 30 35 20 30 34 3a 33 34  2014-02-05 04:34
0000ea0: 3a 30 32 2c 38 38 31 20 49 4e 46 4f 20 6f 72 67  :02,881 INFO org
0000eb0: 2e 61 70 61 63 68 65 2e 7a 6f 6f 6b 65 65 70 65  .apache.zookeepe
0000ec0: 72 2e 73 65 72 76 65 72 2e 4e 49 4f 53 65 72 76  r.server.NIOServ
0000ed0: 65 72 43 6e 78 6e 46 61 63 74 6f 72 79 3a 20 41  erCnxnFactory: A
0000ee0: 63 63 65 70 74 65 64 20 73 6f 63 6b 65 74 20 63  ccepted socket c
0000ef0: 6f 6e 6e 65 63 74 69 6f 6e 20 66 72 6f 6d 20 2f  onnection from /
0000f00: 31 30 2e 39 30 2e 31 30 30 2e 31 36 34 3a 35 37  10.90.100.164:57
0000f10: 30 37 34 00 00 00 98 00 00 00 08 00 00 01 44 3d  074...........D=
0000f20: ed 3e 2e 00 00 00 8c 32 30 31 34 2d 30 32 2d 30  .>.....2014-02-0
0000f30: 35 20 30 34 3a 33 34 3a 30 32 2c 38 39 30 20 49  5 04:34:02,890 I
0000f40: 4e 46 4f 20 6f 72 67 2e 61 70 61 63 68 65 2e 7a  NFO org.apache.z
0000f50: 6f 6f 6b 65 65 70 65 72 2e 73 65 72 76 65 72 2e  ookeeper.server.
0000f60: 5a 6f 6f 4b 65 65 70 65 72 53 65 72 76 65 72 3a  ZooKeeperServer:
0000f70: 20 43 6c 69 65 6e 74 20 61 74 74 65 6d 70 74 69   Client attempti
0000f80: 6e 67 20 74 6f 20 65 73 74 61 62 6c 69 73 68 20  ng to establish
0000f90: 6e 65 77 20 73 65 73 73 69 6f 6e 20 61 74 20 2f  new session at /
0000fa0: 31 30 2e 39 30 2e 31 30 30 2e 31 36 34 3a 35 37  10.90.100.164:57
0000fb0: 30 37 34 00 00 00 b9 00 00 00 08 00 00 01 44 3d  074...........D=
0000fc0: ed 3e 2e 00 00 00 ad 32 30 31 34 2d 30 32 2d 30  .>.....2014-02-0
0000fd0: 35 20 30 34 3a 33 34 3a 30 32 2c 38 39 34 20 49  5 04:34:02,894 I
0000fe0: 4e 46 4f 20 6f 72 67 2e 61 70 61 63 68 65 2e 7a  NFO org.apache.z
0000ff0: 6f 6f 6b 65 65 70 65 72 2e 73 65 72 76 65 72 2e

  ookeeper.server.

1 REPLY 1
Highlighted

Re: Data ingested by Flume converted into Hexdump

Contributor

Your file is indeed a text file. See the right side of your file content, that is the content of your file which is text.  However, you must have turned your view option to binary. What is on the left side is the binary representation of the text content.... ie \32 is the binary representation of the number '2'.

Don't have an account?