About pdvorak

pdvorak · ‎04-21-2016

The javax/jms/JMSException class is found in the geronimo-jms jar file, which isn't included by default in the flume classpath, as it is a third party jar. To include that, you would need to use the flume plugin architecture [1] to add any of the jars needed by activemq -pd [1] http://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins

pdvorak · ‎04-12-2016

Is this the pattern that you are trying to match on: ^\s*\#+|\#+$ Try it without quoting or the forward slashes, like: agent.sources.localsource.interceptors.search-replace.searchPattern = ^\s*\#+|\#+$ agent.sources.localsource.interceptors.search-replace.replaceString = | -pd

pdvorak · ‎03-31-2016

The exec source is generally not recommended for production environments, as it doesn't handle things well if the process that is getting spawned gets killed unexpectedly. With regards to your log files that you are transferring, are you trying to stream them, or just transport them into hdfs? You may want to consider just using an hdfs put command with a cron job, or mounting the hdfs filesystem via nfs, especially if you want to preserve the files in hdfs as-is. Flume is designed for streaming data, not as a file transport mechanism. If you do want to stream them then, the spooldir source would be used if the files are not being appended to. If they are being appended to while flume is reading them, then you would want to use the new taildir source (as of CDH5.5) [1], as it provides a more reliable handling of streaming log files. The spool dir source requires that files are not modified once they are in the spool directory, and they are removed or marked with .COMPLETED when ingestion is finished. -PD [1] http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source

pdvorak · ‎03-31-2016

Can you please define a little more what your use case or requirements are? Flume can replication of ingestion paths to ensure multiple copies downstream, and can also load balance between downstream flume agents to ensure deliver of events down multiple data flow paths. -PD

pdvorak · ‎03-31-2016

You are using a 'tail -f' command on your (I assume) idempotent csv file, which is tailing the last 10 lines (by default) and would continue to tail if you are writing more data to that CSV file. If this file is in fact no longer being modified, and you want to index the whole file, then I would recommend using the spooldir source instead: http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source -PD

pdvorak · ‎03-31-2016

The default maxLineLength for the LINE deserializer is 2048: http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line You can set the following to accomodate your large events: agent.sources.axon_source.deserializer.maxLineLength=10000

pdvorak · ‎02-11-2016

The Exec source is called with the ProcessBuilder: https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html It inherits the environment of the current running flume process

pdvorak · ‎01-05-2016

Using the CloudSolrServer instead of HttpSolrServer will allow the solrj client to load balance between the available solr servers, and is recommended in a Cloudera Search environment. The "No live SolrServers available to handle this request" is indicating a problem with the collection you are trying to update. I would suggest reviewing the currently online replicas, via http://solr.server:8983/#/~cloud and you should be able to see if all the replicas for your collection are online. You need at least one replica per shard to be the leader (as updates go to leaders and then get distributed to associated replicas). -PD

pdvorak · ‎12-22-2015

Are you using Cloudera Manager to start the flume agent? If so, you'll want to configure the heap size through Cloudera Manager. If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value: flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.

pdvorak · ‎12-16-2015

If you wanted to use the morphline interceptor, you could simply use a grok statement to extract the information you need and set it as a new field that becomes a header: http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#grok Here is a grok debugger that you can use to make sure your grok pattern will match your input string: http://grokdebug.herokuapp.com/

Online	Offline
Last Visited	‎01-08-2020 04:37 PM

Member Since	‎01-09-2014 08:15 AM
Last Visited	‎01-08-2020 04:37 PM
Posts	283
Kudos received	70

Cloudera Community

Re: spooldir channel error - too many files. - how...

Re: How to configure Flume with Kafka channel with...

Re: How to configure Flume with Kafka channel with...

Re: Solrcloud Replica Names

Re: flume kafkasource, hdfs sink remove avro field

Re: ActiveMQ to Flume to HDFS

Re: Interceptor not working in flume

Re: Only 10 records populating from local to hdfs ...

Re: Flume HA

Re: Only 10 records populating from local to hdfs ...

Re: Flume adding line feed after 2048 characters i...

Re: How to pass parameters into Flume Exec Source ...

Re: "No live SolrServers available to handle this ...

Re: MorphlineSolrSink GC overhead limit Exceeded i...

Re: Flafka selector doesn't work