Member since
11-03-2014
18
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3736 | 11-27-2014 01:34 AM | |
2580 | 11-21-2014 08:00 AM |
11-27-2014
04:08 AM
TryRules will do too and more elegant. Thanks
... View more
11-27-2014
01:34 AM
Hi After investigating this further, found out that the exception is thrown in saxon command code when attempting to parse the malformed XML. I don't it is right to throw an exceptin which results in aborting the indexer service. A workaround solution is presented below. It consists of using Java command to parse the XML and catch the SaxParseException, log the error and set the '_attachment_body' filed to indicate a malformed XML is detected and other fields present in the same record are being indexed. if { conditions: [ { java { imports: "import java.util.*; import org.xml.sax.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import java.io.StringReader; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.InputSource;" code: """ List payload = record.get("_attachment_body"); try { // parse the content of record field SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); InputSource is = new InputSource(new StringReader(new String((byte[]) payload.get(0)))); saxParser.parse(is, new DefaultHandler()); } catch (Exception e) { logger.error("Malformed XML detected"); logger.error("payload: {} for record: {}", payload, record); return true; } return false; """ } } ] then: [ { translate { field : _attachment_body dictionary : { 0 : dummy } fallback : "Malformed XML detected" # if no fallback is defined and no match is found then the command fails } } {logInfo {format: "Ignoring none well formed XML..."}} ] else: [ { xquery { fragments: [ { fragmentPath: "/" queryString: """ (: All namespace declarations go here 🙂 (: Extracting all the fieleds that need indexing 🙂 let $source := /message/messageData/LegacyMessage/LegacyHeader/SenderDetails/SendingApplication/text() | /message/messageHeader/sendingApplication/text() let $orgId := /message/messageData/LegacyMessage/LegacyHeader/SenderDetails/ODSID/text() | /message/securityData/organisationId/text() let $prescriptionId := /message/messageData/*/prescriptionId/text() | /message/messageData/*/UPN/text() let $transaId := /message/messageHeader/transactionId/text() (: Returning the list of the fields that needs to be indexed. These fields are defined in solar schema.xml file. 🙂 return <fieldsToIndex> <source>{$source}</source> <organisationNationalId>{$orgId}</organisationNationalId> <prescriptionId>{$prescriptionId}</prescriptionId> <transactionId>{$transaId}</transactionId> </fieldsToIndex> """ } ] } } ] } }
... View more
11-21-2014
08:00 AM
It turned out that the hadoop Job below is only required for batch indexing.. so not needed for now. All seems to be working fine now.
... View more
11-21-2014
07:58 AM
Hi I have written a morphline xquery command - see below snippet- which expects an XML as an input and indexes a list of fields and two other fields "a:name" and "a:version". Note that the data is coming from hbase. When the input is a none well formed XML, the indexer throws an exception which is the correct behaviour. The problem is that the other fields: "a:name" and "a:version" do not get indexed as result of the filed "p:in" containing an none well formed XML. Also, subsequent input into hbase are all valid XML do not get indexed either, somehow the indexer gets stuck on that exception and never reover until the bad XML is removed from hbase. I have tried to use try and catch exception in the Xquery, only to find out the error happens well beofre the Xquery engine parses the XML - see below stack trace. Looks like the SaxonCommand class isn't handling the exception well. Any ideas on how to solve this use case is very much welecome. You may be wondering as to why we are processing a none well formed XML in the first place. Business uses case mandates that we have to audit these bad XMLs. Regards, Ayache MORPHLINE CONF morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "a:name" outputField : "serviceName" type : string source : value } { inputColumn : "a:version" outputField : "serviceVersion" type : string source : value } { inputColumn : "p:in" outputField : "_attachment_body" type : "byte[]" source : value } ] } } { if { conditions : [ { equals { _attachment_body : [] } } ] then : [ {logInfo { format : "no payload..." } } ] else : [ { xquery { languageVersion : "3.0" fragments : [ { fragmentPath : "/" queryString : """ (: All namespace declarations go here 🙂 (: Extracting all the fieleds that need indexing 🙂 try { let $source := /message/messageData/LegacyMessage/LegacyHeader/SenderDetails/SendingApplication/text() | /message/messageHeader/sendingApplication/text() let $orgId := /message/messageData/LegacyMessage/LegacyHeader/SenderDetails/ODSID/text() | /message/securityData/organisationId/text() let $prescriptionId := /message/messageData/*/prescriptionId/text() | /message/messageData/*/UPN/text() let $transaId := /message/messageHeader/transactionId/text() (: Returning the list of the fields that needs to be indexed. These fields are defined in solar schema.xml file. 🙂 return <fieldsToIndex> <source>{$source}</source> <organisationNationalId>{$orgId}</organisationNationalId> <prescriptionId>{$prescriptionId}</prescriptionId> <transactionId>{$transaId}</transactionId> </fieldsToIndex> } catch * { $err:code, $err:value, " module: ", $err:module, "(", $err:line-number, ",", $err:column-number, ")" } """ } ] } } { logInfo { format : "Finished processing..." } } ] } } ] } ] STACK TRACE 4/11/21 15:37:16 WARN impl.SepConsumer: Error processing a batch of SEP events, the error will be forwarded to HBase for retry java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at com.ngdata.sep.impl.SepConsumer.waitOnSepEventCompletion(SepConsumer.java:294) at com.ngdata.sep.impl.SepConsumer.replicateWALEntry(SepConsumer.java:275) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20176) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:102) at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97) ... 5 more Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73) at com.ngdata.hbaseindexer.morphline.LocalMorphlineResultToSolrMapper.map(LocalMorphlineResultToSolrMapper.java:245) at com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper.map(MorphlineResultToSolrMapper.java:145) at com.ngdata.hbaseindexer.indexer.Indexer$RowBasedIndexer.calculateIndexUpdates(Indexer.java:289) at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:144) at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:99) ... 6 more Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at org.kitesdk.morphline.saxon.SaxonCommand.doProcess(SaxonCommand.java:78) at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.stdlib.IfThenElseBuilder$IfThenElse.doProcess(IfThenElseBuilder.java:110) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at com.ngdata.hbaseindexer.morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells.doProcess(ExtractHBaseCellsBuilder.java:86) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at com.ngdata.hbaseindexer.morphline.LocalMorphlineResultToSolrMapper.map(LocalMorphlineResultToSolrMapper.java:239) ... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3258) at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3200) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2832) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.kitesdk.morphline.saxon.XMLStreamCopier.copy(XMLStreamCopier.java:169) at org.kitesdk.morphline.saxon.SaxonCommand.parseXmlDocument(SaxonCommand.java:99) at org.kitesdk.morphline.saxon.XQueryBuilder$XQuery.doProcess2(XQueryBuilder.java:158) at org.kitesdk.morphline.saxon.SaxonCommand.doProcess(SaxonCommand.java:74) ... 29 more 14/11/21 15:37:16 ERROR impl.SepConsumer: Encountered exceptions on 2 batches (out of 2 total batches) 14/11/21 15:37:16 ERROR ipc.RpcServer: Unexpected throwable object java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at com.ngdata.sep.impl.SepConsumer.waitOnSepEventCompletion(SepConsumer.java:309) at com.ngdata.sep.impl.SepConsumer.replicateWALEntry(SepConsumer.java:275) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20176) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at com.ngdata.sep.impl.SepConsumer.waitOnSepEventCompletion(SepConsumer.java:294) ... 10 more Caused by: java.lang.RuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:102) at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97) ... 5 more Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73) at com.ngdata.hbaseindexer.morphline.LocalMorphlineResultToSolrMapper.map(LocalMorphlineResultToSolrMapper.java:245) at com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper.map(MorphlineResultToSolrMapper.java:145) at com.ngdata.hbaseindexer.indexer.Indexer$RowBasedIndexer.calculateIndexUpdates(Indexer.java:289) at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:144) at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:99) ... 6 more Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at org.kitesdk.morphline.saxon.SaxonCommand.doProcess(SaxonCommand.java:78) at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.stdlib.IfThenElseBuilder$IfThenElse.doProcess(IfThenElseBuilder.java:110) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at com.ngdata.hbaseindexer.morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells.doProcess(ExtractHBaseCellsBuilder.java:86) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at com.ngdata.hbaseindexer.morphline.LocalMorphlineResultToSolrMapper.map(LocalMorphlineResultToSolrMapper.java:239) ... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </dispenserDetails>; expected </organisation>. at [row,col {unknown-source}]: [87,27] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3258) at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3200) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2832) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.kitesdk.morphline.saxon.XMLStreamCopier.copy(XMLStreamCopier.java:169) at org.kitesdk.morphline.saxon.SaxonCommand.parseXmlDocument(SaxonCommand.java:99) at org.kitesdk.morphline.saxon.XQueryBuilder$XQuery.doProcess2(XQueryBuilder.java:158) at org.kitesdk.morphline.saxon.SaxonCommand.doProcess(SaxonCommand.java:74) ... 29 more
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
11-19-2014
02:15 PM
Hi Last week I was experimenting with Lily Hbase indexer serivice. I've downloaed the cloudera quick start VM and configured the morphline and the indexing seems to work well. However, when I manually install Hbase, Soloar cloud and the Lily indexer service - bear in mind that all these are downloaded from Cloudera download page - I get the error below. I have two VMs set up as follow: VM1: hbase1, hbase2 & hbase3 running Zookeeper, Hadoop, HBase, Mapreduce & Yarn VM2: running SOLR & the Lily Indexer The command to add the Mapreduce job: hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.2.0-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapred.child.java.opts=-Xmx500m --log4j /etc/hbase-solr/conf/log4j.properties --hbase-indexer-zk hbase1:2181,hbase2:2181,hbase3:2181 --hbase-indexer-file /etc/hbase-solr/conf/morphline-indexer-mapper.xml --hbase-indexer-name portalaudit --zk-host hbase1:2181,hbase2:2181,hbase3:2181/solr --collection portal-audit --go-live This spits out lots of content, but when it gets to fiddling with the Jars Lily looks under hdfs:// for the Jars. There are a handful of posts on the Internet with the same problem, but none of them have decent answers. The only one with a possible answer suggests to upload the Jars into HDFS, but that feels wrong and is a complete workaround that will probably break at some point. The exception when adding the Mapreduce job: 14/11/19 16:12:49 INFO zookeeper.ClientCnxn: EventThread shut down 14/11/19 16:12:49 INFO hadoop.ForkedMapReduceIndexerTool: Indexing data into 1 reducers 14/11/19 16:12:49 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 14/11/19 16:12:50 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root850902700/.staging/job_local850902700_0001 Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://3xNodeHA/usr/lib/hadoop/lib/guava-11.0.2.jar at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1083) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) Ignoring the above error resut in failure to index the relevant fields in Solar. Any help is very much appreciated. Thanks Ayache
... View more
Labels:
11-10-2014
07:48 AM
Got you, all working perfectly now. Thank you so much for your help, I now understand more about how morphline / hbase-indexer work.
... View more
11-10-2014
06:20 AM
Hi I tried equals earlier, but I am afraid it's not working form me. Here is snippet for morphline: if { conditions : [ { equals { p : [] } } ] then : [ # handling headers {logInfo { format : "processing headers..." } } ] else : [ # handling payload {logInfo { format : "processing payload..." } } ] So with the follwing entry into hbase: put 'payload', 'row1', 'p', '<employees><employee><name>ayache</name><age>29</age></employee></employees>' still going into the 'handling headers' close. My use case mandate that the field 'p' is qualified with qualifier 'in': So not usre how to reference qualifier in the 'equals' command. I've tried this equals { "p:in" : [] } . This is the next step really, It's not matching withouth the qualifier anyway, any idea what I have missed? Regards, Ayache
... View more
11-10-2014
03:23 AM
Thanks again Wolfgang for your prompet respon, this is very helpful. I've looked at teh if/els and tryRules. Both use the 'contains' command for conditions. The contain however only matches the value of the field, my use case is about asserting a presence of the fied regardless of the value. So if the filed is 'context' I want to apply xquery step if not, just one to one mapping from hbase cell. Is there a command to check for the presence of a field? Thanks
... View more
11-09-2014
01:51 AM
Hi Just one last question regarding the above implementation. How do I make the xquery step only get triggered for certain column 'family name'. Here is the scenario put 'record', 'row1', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>' The above input into hbase will be handled by xquery step. put 'record', 'row2', 'context', 'business' ===> for this input I don't want to call the xquery strep, rather the following mapping will suffice (extracting cell value { inputColumn : "context" outputField : "context" type : string source : value } I thought about declaring two morphlines, one handling fields that require xquery step and others just one to oine mapping. I tried something like this: morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "data" outputField : "_attachment_body" type : "byte[]" source : value } ] } } { xquery { fragments : [ { fragmentPath : "/" queryString : """ (: All namespace declarations go here 🙂 declare namespace inps = "http://inps.co.uk/"; (: Extracting all the fieleds that need indexing 🙂 let $name := /employees/employee/name/text() (: Returning the list of the fields that needs to be indexed. These fields are defined in solar schema.xml file. 🙂 return <fieldsToIndex> <name>{$name}</name> </fieldsToIndex> """ } ] } } { logTrace { format : "output record: {}", args : ["@{}"] } } ] } { id : morphline2 importCommands : ["org.kitesdk.**", "com.ngdata.**"] commands : [ ]{ extractHBaseCells { mappings : [ { inputColumn : "context" outputField : "context" type : string source : value } ] } } { logTrace { format : "output record: {}", args : ["@{}"] } } ] } ] Looks like the second morphline is ignored. First, is this a sensible solution? If so, how do I make the second morphline known. kind regards, Ayache
... View more
11-06-2014
04:00 AM
Many thanks for your help. Ayache
... View more