Member since
08-21-2013
146
Posts
25
Kudos Received
34
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3109 | 10-24-2016 10:43 AM | |
6907 | 03-13-2016 02:15 PM | |
3550 | 12-11-2015 01:48 AM | |
3015 | 11-23-2015 12:11 PM | |
2776 | 07-06-2015 10:40 AM |
08-20-2014
11:20 AM
See http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/Search/Cloudera-Search-User-Guide/csug_metadata.html
... View more
07-16-2014
11:23 PM
1 Kudo
The data is streamed so the file size should be irrelevant. Perhaps there are too many reducer slots configured, i.e. the machine doesn?t have enough RAM to handle that many reducers concurrently (there?s also a --reducers CLI option which always rounds up to a multiple of the number of solr shards) To see if the JVM -Xmx settings are applied correctly, you could, for example, add the registerJVMMetrics and startReportingMetricsToSLF4J commands to your morphline http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/registerJVMMetrics http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#startReportingMetricsToSLF4J Wolfgang.
... View more
05-29-2014
05:31 AM
1 Kudo
See the answer here: https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/4PkFFmG59vk Wolfgang.
... View more
05-27-2014
02:06 AM
1 Kudo
To make Solr & XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these: https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71 Wolfgang.
... View more
05-01-2014
02:28 PM
Great! Looking forward to meet up @ Berlin Buzzwords. Wolfgang.
... View more
05-01-2014
01:59 PM
If indeed the data in HBase contains the XML tags, then it sounds like your tokenizer/analyzer chain in Solr schema.xml is stripping info away, i.e. schema.xml isn?t configured to do what you want it to do. You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline: logInfo { format : "my record: {}", args : ["@{}"] } Also see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr Wolfgang.
... View more
05-01-2014
09:37 AM
1 Kudo
Interceptors are executed prior to Sinks. If the UUIDInterceptor does nothing it's probably misconfigured or attached to the wrong channel in flume.conf, or similar. Alternatively, consider replacing the UUIDInterceptor with a MorphlineInterceptor that uses the generateUUID command, or move the generateUUID command into the morphline config of the MorphlineSolrSink. Also see http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/generateUUID Wolfgang.
... View more
05-01-2014
06:41 AM
OOM would explain it. The flume default setting for jvm memory is very low. Try something like -Xmx512m -XX:MaxPermSize=256m
... View more
05-01-2014
03:33 AM
Weird, what (Solr, CDH, Cloudera Manager) version is this with? To automatically print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your log4j.properties file, e.g. via Cloudera Manager, per http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/logTrace: log4j.logger.org.kitesdk.morphline=TRACE
... View more
04-30-2014
01:14 PM
Try to call it dictionaryFiles : [grok-dictionary.conf] perhttps://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Managing-Clusters/cmmc_adding_search_solr.html Wolfgang.
... View more