About pdvorak

pdvorak · ‎12-15-2015

That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?

pdvorak · ‎12-14-2015

Did you try this one: https://github.com/szaharici/Flume-Json-Interceptor Once you have that compiled, you'll want to put that into the flume /var/lib/flume-ng/plugins.d directory in the proper subdirectory (and with proper permissions for flume to read), following the convention here: http://flume.apache.org/FlumeUserGuide.html#the-plugins-d-directory

pdvorak · ‎12-11-2015

Glad to hear you got it working. You can't concatenate headers together with the flume config. However, the morphline interceptor will allow you more complex functionality for manipulating headers: http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor This will allow you to arbitrarily update/delete/modify headers as well as event body, prior to being passed to the channel selector. You can write a morphline that will examine the body and set any headers that you wish. Here is the morphlines command reference guide to help you get started: http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html HTH!

pdvorak · ‎12-10-2015

If your events coming from kafka are in json format, you could put together a quick json interceptor, and that way all your fields in json would get populated as flume headers. Here are some examples: http://mmolimar.blogspot.com/2015/01/analyzing-tweets-from-flume-in-kibana.html https://github.com/szaharici/Flume-Json-Interceptor If you do stick to just the regex interceptor, you are trying to use the (\\d+) to capture a string field, which will not match as the \d is for digits. You'd need to do something like flume1.sources.kafka-source-1.interceptors.i1.regex = "product":"(\\w+)" Which will match any word characters: http://www.w3schools.com/jsref/jsref_regexp_wordchar.asp I would recommend creating a json interceptor though, as that will give you the most flexibility, and all your json fields will be populated in the headers

pdvorak · ‎12-09-2015

Hello, Can you please add a logging channel and logger sink to your flume configuration? This would show, in the solr-cmf logs, exactly what headers are set for the events coming from your kafka source. You would need to add something like this (multiple channels on the default selector will be replicating): flume1.sources.kafka-source-1.selector.default = hdfs-channel-7 logChannel flume1.channels.logChannel.type = memory flume1.sinks.logSink.type = logger flume1.sinks.logSink.channel = logChannel

pdvorak · ‎10-21-2015

It appears you are using the replica name for the shard value. Try this instead: http://solrserver1:8983/solr/admin/collections?action=ADDREPLICA&collection=mycollection&shard=shard1&node=solrserver2:8983_solr

pdvorak · ‎10-19-2015

If you are using CDH5.4, you can use the collections API ADDREPLICA command to add a replica for a given shard [1]. Pay attention to the format of the node parameter, it needs to be hostname:8983_solr [1] https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

pdvorak · ‎09-08-2015

Adding sinks to your configuration will parallelize the delivery of events, (i.e. adding another sink will double your event drain rate, 3 will triple, etc). You'll want to be sure to add a unique hdfs.filePrefix to each sink in order to ensure there are no filename collisions. If you have multiple hosts, that uniqueness would need to cover hostnames as well.

pdvorak · ‎09-07-2015

Ideally, if your sinks are delivering fast enough, your channel size should usually be near zero. If your channel size is growing, it is an indication that your sinks are not delivering fast enough, or there are issues downstream and you can either increase the batchSize or add more sinks. Can you post your flume configuration, that might give a better indication of where improvements can be made? Are you seeing any errors delivering to hdfs?

pdvorak · ‎08-05-2015

Kafka 1.3.1 is a maintenance release, and here is the list of fixed issues: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-kafka/latest/topics/kafka_fixed_issues.html HTH! -PD

Online	Offline
Last Visited	‎01-08-2020 04:37 PM

Member Since	‎01-09-2014 08:15 AM
Last Visited	‎01-08-2020 04:37 PM
Posts	283
Kudos received	70

Cloudera Community

Re: spooldir channel error - too many files. - how...

Re: How to configure Flume with Kafka channel with...

Re: How to configure Flume with Kafka channel with...

Re: Solrcloud Replica Names

Re: flume kafkasource, hdfs sink remove avro field

Re: Flume - This role's process is starting. This ...

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Move Solr server from one host to another usin...

Re: Move Solr server from one host to another usin...

Re: Flume - Memory Channel Full

Re: Flume - Memory Channel Full

Re: New Kafka 0.8.2.0-1.kafka1.3.1.p0.9 Parcel - W...