Flume folder routing based on HTTP header

Using curl and Flume, I would like to post CSV files on my local machine/HDFS at different locations based on the values of the HTTP header. For example, for this HTTP header (Network-Element: GGSN) I would like my files to be stored on my local machine in a folder named GGSN.

I have the following Flume configuration

- a HTTP source
- a memory channel
- a HDFS sink that routes the events files to different locations depending on the HTTP header

I then post CSV files using curl:

find /path/files -type f -exec curl -X POST http://localhost:9043 -H "Content-Type: text/xml" -H "Network-Element: GGSN" --data-binary "@{}" -v \;


These logs are generated:

* About to connect() to localhost port 9043 (#0)
* Trying ::1... Connection refused
* Trying connected
* Connected to localhost ( port 9043 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/ zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:9043
> Accept: */*
> Content-Type: text/xml
> Network-Element: GGSN
> Content-Length: 972660
> Expect: 100-continue
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Server: Jetty(6.1.26)
* Connection #0 to host localhost left intact
* Closing connection #0


Flume logs show the following:

2015-03-16 19:41:14,887 DEBUG org.apache.flume.sink.solr.morphline.BlobHandler: requestHeaders: {Expect=100-continue, Host=localhost:9043, Content-Length=972660, Network-Element=GGSN, User-Agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/ zlib/1.2.3 libidn/1.18 libssh2/1.4.2, Content-Type=text/xml, Accept=*/*}
2015-03-16 19:41:14,891 DEBUG org.apache.flume.sink.solr.morphline.BlobHandler: blobEvent: [Event headers = {Content-Type=text/xml}, body.length = 972660 ]


I use this Flume configuration:

sa.sources = httpsource1
sa.channels = memorychannel1
sa.sinks = localsink1

sa.sources.httpsource1.type = http
sa.sources.httpsource1.handler = org.apache.flume.sink.solr.morphline.BlobHandler
sa.sources.httpsource1.port = 9043
sa.sources.httpsource1.channels = memorychannel1

sa.channels.memorychannel1.type = memory
sa.channels.memorychannel1.capacity = 10000
sa.channels.memorychannel1.transactionCapacity = 1000

sa.sinks.localsink1.type = file_roll = memorychannel1 = /path/%{Network-Element}
sa.sinks.localsink1.sink.rollInterval = 36000


For some reason files cannot be placed under this path: /path/%{Network-Element}
It looks like this path does not exist, even if I have manually created the GGSN folder and set all the permissions to it.


Re: Flume folder routing based on HTTP header

File Roll sink does not support escape sequences (%...). Use HDFS sinks for this functionality