About whosch

whosch · ‎08-20-2014

See http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/Search/Cloudera-Search-User-Guide/csug_metadata.html

whosch · ‎07-16-2014

The data is streamed so the file size should be irrelevant. Perhaps there are too many reducer slots configured, i.e. the machine doesn?t have enough RAM to handle that many reducers concurrently (there?s also a --reducers CLI option which always rounds up to a multiple of the number of solr shards) To see if the JVM -Xmx settings are applied correctly, you could, for example, add the registerJVMMetrics and startReportingMetricsToSLF4J commands to your morphline http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/registerJVMMetrics http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#startReportingMetricsToSLF4J Wolfgang.

whosch · ‎05-29-2014

See the answer here: https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/4PkFFmG59vk Wolfgang.

whosch · ‎05-27-2014

To make Solr & XML Parser happy consider removing non-valid characters from input strings. Perhaps plug some sanity fixup logic into a custom morphline command, along similar lines as these: https://github.com/kite-sdk/kite/blob/master/kite-morphlines/kite-morphlines-solr-cell/src/main/java/org/kitesdk/morphline/solrcell/StripNonCharSolrContentHandlerFactory.java#L56-71 Wolfgang.

whosch · ‎05-01-2014

Great! Looking forward to meet up @ Berlin Buzzwords. Wolfgang.

whosch · ‎05-01-2014

If indeed the data in HBase contains the XML tags, then it sounds like your tokenizer/analyzer chain in Solr schema.xml is stripping info away, i.e. schema.xml isn?t configured to do what you want it to do. You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline: logInfo { format : "my record: {}", args : ["@{}"] } Also see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr Wolfgang.

whosch · ‎05-01-2014

Interceptors are executed prior to Sinks. If the UUIDInterceptor does nothing it's probably misconfigured or attached to the wrong channel in flume.conf, or similar. Alternatively, consider replacing the UUIDInterceptor with a MorphlineInterceptor that uses the generateUUID command, or move the generateUUID command into the morphline config of the MorphlineSolrSink. Also see http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/generateUUID Wolfgang.

whosch · ‎05-01-2014

OOM would explain it. The flume default setting for jvm memory is very low. Try something like -Xmx512m -XX:MaxPermSize=256m

whosch · ‎05-01-2014

Weird, what (Solr, CDH, Cloudera Manager) version is this with? To automatically print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your log4j.properties file, e.g. via Cloudera Manager, per http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/logTrace: log4j.logger.org.kitesdk.morphline=TRACE

whosch · ‎04-30-2014

Try to call it dictionaryFiles : [grok-dictionary.conf] perhttps://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Managing-Clusters/cmmc_adding_search_solr.html Wolfgang.

Online	Offline
Last Visited	‎06-08-2016 05:03 PM

Member Since	‎08-21-2013 12:00 PM
Last Visited	‎06-08-2016 05:03 PM
Posts	146
Kudos received	25

Cloudera Community

Re: Lily Indexer: ensuring Hbase and Solr record c...

Re: readJsonTweets Morphline using the MapReduceIn...

Re: Indexing data to Solr with a MapReduce and Mor...

Re: Configure the Lily HBase indexer to index cert...

Re: Key-Value Store Indexer for NRT with number of...

Re: how to add source filename to cloudera search ...

Re: MapReduceIndexerTool OutOfMemoryError in Mappe...

Re: Ingnoring a morphline error

Re: How to avoid CharConversionException in HttpSo...

Re: need help with flume - morphline - solr pipeli...

Re: Extracthbase cell command does not retain xml ...

Re: need help with flume - morphline - solr pipeli...

Re: need help with flume - morphline - solr pipeli...

Re: need help with flume - morphline - solr pipeli...

Re: need help with flume - morphline - solr pipeli...