Member since
08-21-2013
146
Posts
25
Kudos Received
34
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1653 | 10-24-2016 10:43 AM | |
4211 | 03-13-2016 02:15 PM | |
2404 | 12-11-2015 01:48 AM | |
1778 | 11-23-2015 12:11 PM | |
1715 | 07-06-2015 10:40 AM |
05-23-2017
04:06 PM
Custom morphline commands can maintain state if you need to, so, in principle, it is possible.
... View more
03-09-2017
01:58 PM
1 Kudo
Consider using the CrunchIndexerTool - https://www.cloudera.com/documentation/enterprise/latest/topics/search_spark_index_ref.html#xd_583c10bfdbd326ba-7dae4aa6-147c30d0933--7d63 Wolfgang
... View more
02-13-2017
09:26 AM
The log file will be on the remote hosts that ran the map tasks, not on the host that started the map reduce driver. Wolfgang
... View more
10-24-2016
10:43 AM
Here is a useful related read: http://www.ngdata.com/the-hbase-side-effect-processor-and-hbase-replication-monitoring/
... View more
07-19-2016
09:10 AM
The extractJsonPaths command expects a jackson json object as input rather than a java.util.Map. Try to remove the "outputClass : java.util.Map" option from the readJson command.
... View more
05-31-2016
10:13 AM
The morphline conf file must be present on the local file system of each Mapper task. For example, you can submit it there via the --files CLI option of MR job, in which case the file will end up in the CWD of the mapper task process, where you can point to it via a relative file path.
... View more
05-17-2016
09:00 AM
When using " --input-file-format avro" then your morphline should start with a extractAvroPaths or extractAvroTree command, rather than start with a readAvroContainer or readAvro morphline command. This is because --input-file-format already takes care of parsing the input stream into Java main memory Avro objects.
... View more
05-16-2016
06:37 PM
Try setting the number of spark executors and cores, etc - e.g. see http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
... View more
03-13-2016
02:15 PM
1 Kudo
Looks like you are missing a loadSolr command in your morphline, for example as shown here: see http://www.cloudera.com/documentation/enterprise/latest/topics/search_batch_index_use_mapreduce.html?scroll=csug_topic_4_3 (FYI, with MapReduceIndexerTool the SOLR_LOCATOR is substituted from whatever is specified on the CLI with --zk-host option)
... View more
03-01-2016
04:17 PM
Try using something like this CLI option: -D mapreduce.map.java.opts="-Xmx2000m"
... View more
01-08-2016
03:56 PM
Configuring saxon won't help because the woodstox xml parser is invoked before saxon even comes into play. Maybe there's a java system property for woodstox that enables xml 1.1? Or maybe a more recent version of woodstox understands xml 1.1 out of the box? (As an aside, you'd use "http://saxon.sf.net/feature/xml-version" instead of "XML_VERSION" because public final static String XML_VERSION = "http://saxon.sf.net/feature/xml-version"; ) Wolfgang
... View more
12-20-2015
11:58 AM
1 Kudo
You'd need to write some java code to express this part: (severity >= 0 && severity <= 299), for example using the "java" morphline command. Wolfgang.
... View more
12-11-2015
01:48 AM
On yarn the params are called mapreduce.map.java.opts and mapreduce.reduce.java.opts. Wolfgang.
... View more
11-23-2015
11:00 PM
Custom morphline commands are deployed by adding the jar with the custom code to the hbase-indexer Java classpath. The morphline runs inside the hbase-indexer processes which are separate from the hbase processes. It has no impact on the stability of the hbase service.
... View more
11-23-2015
12:11 PM
1 Kudo
You can plug a morphline into hbase-indexer to do some mini ETL on the fly during indexing from HBase into Solr. See the docs: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_hbase_batch_indexer.html and http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_etl_morphlines.html
... View more
10-08-2015
09:13 AM
To automatically print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your log4j.properties file: log4j.logger.org.kitesdk.morphline=TRACE
... View more
10-02-2015
02:14 PM
Yes, try loading a different log4j.properties file with custom log level directives, e.g. via the --log4j CLI option in MapReduceIndexerTool, or similar. Wolfgang.
... View more
08-24-2015
02:19 PM
HBase TTL feature isn't supported with hbase-indexer (because hbase doesn't send delete events via hbase replication for TTL deletes) Wolfgang.
... View more
08-24-2015
01:49 PM
readJson copies the input record and adds to these new record objects.
... View more
08-03-2015
07:27 AM
If you want that you'd need to replace the readLine command with your own custom command.
... View more
07-06-2015
10:40 AM
1 Kudo
The SOLR_LOCATOR is a variable that works via simple text substitution (ala unix shell scripts). You can define as many variables as you like within the same morphline config file. For example along these lines: SOLR_LOCATOR_1 : { collection : collection1, zkHost : ${ZK_HOST} } SOLR_LOCATOR_2 : { collection : collection2, zkHost : ${ZK_HOST} } morphlines : [ { id : morphline1 ... { loadSolr { solrLocator : ${SOLR_LOCATOR_1} } } } { id : morphline2 ... { loadSolr { solrLocator : ${SOLR_LOCATOR_2} } } } ] Wolfgang
... View more
07-02-2015
10:58 AM
1 Kudo
The Flume spool source emits one Flume event per input line. Thus, the morphline never receives an event that contains multiple lines. Thus readMultiLine can never emit more than a single line per Flume event. Maybe you can work-around it by configuring the Flume spool source to use the BlobDeserializer, which emits the entire input file in a single event (not applicable to large files due too RAM pressure). Wolfgang.
... View more
06-12-2015
05:54 AM
Try to use the sanitizeUnkownSolrFields command per http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#sanitizeUnknownSolrFields Wolfgang.
... View more
06-12-2015
03:50 AM
Maybe readAvroContainer fails because your avro data isn't contained in an avro container, in which case use readAvro command instead of readAvroContainer. In any case, to automatically print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your log4j.properties file: log4j.logger.org.kitesdk.morphline=TRACE See http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#logTrace This will also print which command failed where. BTW, questions specific to Cloudera Search are best directed to search-user@cloudera.org via http://groups.google.com/a/cloudera.org/group/search-user Wolfgang
... View more
05-24-2015
09:16 AM
I'd recommend using four separate grok commands, one that extracts the data for the "Event" irrespective of position and ignores everything else, one for "White", one for "Black", one for "Date".
... View more
05-20-2015
06:17 AM
The MorphlineInterceptor expects a byte[] or java.io.InputStream in the _attachment_body field of the morphline output record. This will become the body of the flume output event. In your case the _attachment_body field instead contains a jackson JsonNode object - hence it complains.
... View more
05-20-2015
06:12 AM
See the README at https://github.com/kite-sdk/kite/tree/master/kite-morphlines and https://github.com/kite-sdk/kite-examples/tree/master/kite-examples-morphlines
... View more
04-25-2015
12:02 AM
Yes, it needs a compiler and the JDK includes a compiler.
... View more