Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1698 | 06-19-2019 07:50 AM | |
2723 | 05-01-2019 08:07 AM | |
2772 | 04-10-2019 08:49 AM | |
2666 | 03-20-2019 09:30 AM | |
2355 | 01-23-2019 10:58 AM |
12-15-2015
08:39 AM
That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?
... View more
12-14-2015
02:39 PM
Did you try this one: https://github.com/szaharici/Flume-Json-Interceptor Once you have that compiled, you'll want to put that into the flume /var/lib/flume-ng/plugins.d directory in the proper subdirectory (and with proper permissions for flume to read), following the convention here: http://flume.apache.org/FlumeUserGuide.html#the-plugins-d-directory
... View more
12-11-2015
11:15 AM
Glad to hear you got it working. You can't concatenate headers together with the flume config. However, the morphline interceptor will allow you more complex functionality for manipulating headers: http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor This will allow you to arbitrarily update/delete/modify headers as well as event body, prior to being passed to the channel selector. You can write a morphline that will examine the body and set any headers that you wish. Here is the morphlines command reference guide to help you get started: http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html HTH!
... View more
12-10-2015
03:11 PM
If your events coming from kafka are in json format, you could put together a quick json interceptor, and that way all your fields in json would get populated as flume headers. Here are some examples: http://mmolimar.blogspot.com/2015/01/analyzing-tweets-from-flume-in-kibana.html https://github.com/szaharici/Flume-Json-Interceptor If you do stick to just the regex interceptor, you are trying to use the (\\d+) to capture a string field, which will not match as the \d is for digits. You'd need to do something like flume1.sources.kafka-source-1.interceptors.i1.regex = "product":"(\\w+)" Which will match any word characters: http://www.w3schools.com/jsref/jsref_regexp_wordchar.asp I would recommend creating a json interceptor though, as that will give you the most flexibility, and all your json fields will be populated in the headers
... View more
12-09-2015
02:05 PM
Hello, Can you please add a logging channel and logger sink to your flume configuration? This would show, in the solr-cmf logs, exactly what headers are set for the events coming from your kafka source. You would need to add something like this (multiple channels on the default selector will be replicating): flume1.sources.kafka-source-1.selector.default = hdfs-channel-7 logChannel flume1.channels.logChannel.type = memory flume1.sinks.logSink.type = logger flume1.sinks.logSink.channel = logChannel
... View more
10-21-2015
11:33 AM
It appears you are using the replica name for the shard value. Try this instead: http://solrserver1:8983/solr/admin/collections?action=ADDREPLICA&collection=mycollection&shard=shard1&node=solrserver2:8983_solr
... View more
10-19-2015
12:29 PM
If you are using CDH5.4, you can use the collections API ADDREPLICA command to add a replica for a given shard [1]. Pay attention to the format of the node parameter, it needs to be hostname:8983_solr [1] https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
... View more
09-08-2015
09:09 AM
Adding sinks to your configuration will parallelize the delivery of events, (i.e. adding another sink will double your event drain rate, 3 will triple, etc). You'll want to be sure to add a unique hdfs.filePrefix to each sink in order to ensure there are no filename collisions. If you have multiple hosts, that uniqueness would need to cover hostnames as well.
... View more
09-07-2015
09:43 AM
1 Kudo
Ideally, if your sinks are delivering fast enough, your channel size should usually be near zero. If your channel size is growing, it is an indication that your sinks are not delivering fast enough, or there are issues downstream and you can either increase the batchSize or add more sinks. Can you post your flume configuration, that might give a better indication of where improvements can be made? Are you seeing any errors delivering to hdfs?
... View more
08-05-2015
12:07 PM
Kafka 1.3.1 is a maintenance release, and here is the list of fixed issues: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-kafka/latest/topics/kafka_fixed_issues.html HTH! -PD
... View more