Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1698 | 06-19-2019 07:50 AM | |
2723 | 05-01-2019 08:07 AM | |
2772 | 04-10-2019 08:49 AM | |
2666 | 03-20-2019 09:30 AM | |
2355 | 01-23-2019 10:58 AM |
04-21-2016
02:24 PM
The javax/jms/JMSException class is found in the geronimo-jms jar file, which isn't included by default in the flume classpath, as it is a third party jar. To include that, you would need to use the flume plugin architecture [1] to add any of the jars needed by activemq -pd [1] http://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins
... View more
04-12-2016
09:14 AM
Is this the pattern that you are trying to match on: ^\s*\#+|\#+$ Try it without quoting or the forward slashes, like: agent.sources.localsource.interceptors.search-replace.searchPattern = ^\s*\#+|\#+$
agent.sources.localsource.interceptors.search-replace.replaceString = | -pd
... View more
03-31-2016
09:58 PM
The exec source is generally not recommended for production environments, as it doesn't handle things well if the process that is getting spawned gets killed unexpectedly. With regards to your log files that you are transferring, are you trying to stream them, or just transport them into hdfs? You may want to consider just using an hdfs put command with a cron job, or mounting the hdfs filesystem via nfs, especially if you want to preserve the files in hdfs as-is. Flume is designed for streaming data, not as a file transport mechanism. If you do want to stream them then, the spooldir source would be used if the files are not being appended to. If they are being appended to while flume is reading them, then you would want to use the new taildir source (as of CDH5.5) [1], as it provides a more reliable handling of streaming log files. The spool dir source requires that files are not modified once they are in the spool directory, and they are removed or marked with .COMPLETED when ingestion is finished. -PD [1] http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source
... View more
03-31-2016
09:03 PM
Can you please define a little more what your use case or requirements are? Flume can replication of ingestion paths to ensure multiple copies downstream, and can also load balance between downstream flume agents to ensure deliver of events down multiple data flow paths. -PD
... View more
03-31-2016
12:46 PM
You are using a 'tail -f' command on your (I assume) idempotent csv file, which is tailing the last 10 lines (by default) and would continue to tail if you are writing more data to that CSV file. If this file is in fact no longer being modified, and you want to index the whole file, then I would recommend using the spooldir source instead: http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source -PD
... View more
03-31-2016
12:41 PM
The default maxLineLength for the LINE deserializer is 2048: http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line You can set the following to accomodate your large events: agent.sources.axon_source.deserializer.maxLineLength=10000
... View more
02-11-2016
04:02 PM
1 Kudo
The Exec source is called with the ProcessBuilder: https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html It inherits the environment of the current running flume process
... View more
01-05-2016
09:40 AM
1 Kudo
Using the CloudSolrServer instead of HttpSolrServer will allow the solrj client to load balance between the available solr servers, and is recommended in a Cloudera Search environment. The "No live SolrServers available to handle this request" is indicating a problem with the collection you are trying to update. I would suggest reviewing the currently online replicas, via http://solr.server:8983/#/~cloud and you should be able to see if all the replicas for your collection are online. You need at least one replica per shard to be the leader (as updates go to leaders and then get distributed to associated replicas). -PD
... View more
12-22-2015
08:41 AM
1 Kudo
Are you using Cloudera Manager to start the flume agent? If so, you'll want to configure the heap size through Cloudera Manager. If you are not using Cloudera Manager, you will want to specify the following on the command line, not as a -Dproperty value: flume-ng agent -n a1 -f flume_config.conf -Xms1024m -Xmx=204800m We recommend to set the -Xms and -Xmx values to the same amount so the jvm does not have to resize the heap which can cause performance issues.
... View more
12-16-2015
06:48 AM
If you wanted to use the morphline interceptor, you could simply use a grok statement to extract the information you need and set it as a new field that becomes a header: http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#grok Here is a grok debugger that you can use to make sure your grok pattern will match your input string: http://grokdebug.herokuapp.com/
... View more