Member since
07-29-2013
162
Posts
8
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7330 | 05-06-2015 06:52 AM | |
3185 | 06-09-2014 10:51 PM | |
5331 | 01-30-2014 10:40 PM | |
3834 | 08-22-2013 12:28 AM | |
5284 | 08-18-2013 11:23 PM |
11-25-2014
06:15 AM
Result is the same I do get contents of expression {$entry/../../body}, but all tags are removed. Imagine I had <body> <inner>inner text </inner> </body> I get "inner text" as a result. I want to get <inner>inner text </inner> without removed tag
... View more
11-24-2014
02:14 PM
Hi, I've tried this: return
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
<body>{$entry/../../ecol:body}</body>
</entry> And that: return
<record>
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
<body>{$entry/../../ecol:body}</body>
</entry>
</record> Nothing helps. Looks like I don't understand the idea and how it works. <body>{$entry/../../ecol:body}</body> is extracted, but still ALL tags under <ecol:body/> are moved. What do I do wrong?
... View more
11-18-2014
01:46 AM
Cool, thanks! I'll try this evening.
... View more
11-15-2014
07:16 AM
Hi, I use morphline to parse incomming xml and store it to Solr. The problem is that morphline removes all tags. I need to store to Solr a subtree from incomming XML. Example: <ecol:body>
<out:StatusMessage xmlns:out="http://lol.ru/coordinate/v5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<out:ResponseDate xsi:nil="true"/>
<out:PlanDate xsi:nil="true"/>
<out:StatusCode>1040</out:StatusCode>
<out:Responsible>
<out:LastName>XXXX</out:LastName>
<out:FirstName>YY</out:FirstName>
</out:Responsible>
<out:Note/>
<out:ServiceNumber>123123123</out:ServiceNumber>
</out:StatusMessage>
</ecol:body> A part of my morphline config: return
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
{$entry/../../ecol:body}
</entry> a valaue for a <ecol:body> has: 1040 \n XXXX \n YY 12312312123 and ALL tags are removed. I want to leave tags. Is there anypossibility to do that?
... View more
Labels:
- Labels:
-
Apache Solr
11-04-2014
04:25 AM
Hi, I'm trying to switch to CDH5 🙂 I have several nodes each as 32GB ram. Here is my confusion: 1. How to control tasks concurrency? Really, I don't any pools/schedulers right now. I need one huge pool. I don't need any complicity right now, since there is no reason for it. How can I force run all my MR jobs in one huge pool? 2. How to contol memory allocation? I've used mapred.child.java.opts in MR1, in MR2 it doesn't work. 3. I see 100500 setting related to *.memory.mb *.memory.max. I've read description and don't see how they fluence on my MR jobs. 4. Resource manager shows 9 node managers (NM). Each NM has 8Vcores (it's ok, there is 8 HT cores on each node) and 8GB RAM Why 8GB RAM? I have 32GB per node? How can I change it?
... View more
Labels:
- Labels:
-
Apache YARN
10-20-2014
04:04 PM
Hi, what kind of source you use to get attachment_body? Is it possible to use HttpSource as a source and morphlinesolrsink to process accepted payload (xml data)
... View more
10-15-2014
08:18 AM
Hi, looks like it's a bug, your reply created new sperate forum thread. I see the defference. EXample from CDK tests return: text="sample tweet one" sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one" And mine doesn't return someName=someValue, only someValue: 431 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet Twitter-based example takes name "text" becuase it's attribute name from xml? I've refactored my xq: queryString : """
for $entry in /collectorEvent/attributes/etpEventCollectorAttributes
return
<entry>
{$entry/ssoId}
{$entry/applicationId}
</entry>
""" now it returns: {applicationId=[123], ssoId=[someSSO_id} Thanks, it works!
... View more
10-14-2014
02:45 PM
Hi, I've met strange thing. Here is good config: # Copyright 2013 Cloudera Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/tweets/tweet/@text" # each item in result sequence becomes a morphline record
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] here is it's partial output: 1008 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one"
1015 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - beforeProcess: {id=[123], text=[sample tweet one]}
1015 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record: [{id=[123], text=[sample tweet one]}]
38023 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #2 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet two"
38025 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - beforeProcess: {id=[123], text=[sample tweet two]}
38025 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record: [{id=[123], text=[sample tweet two]}] Here is "bad" config, only xquery command is executed: morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId/text()"
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] I clearly see in logs that first config runs w/o any problems, this config is taken from SaxonMorphlineTest The second config executes w/o any problems but logDebug is not working. Here is output: 431 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet
collector.getRecords()[] Here is what I see in Idea. Why does idea "highlights" "xquery" in good config? I'm on linux, I don't see any "bad chars" in text editor.
... View more
10-14-2014
11:30 AM
Thanks for your patience,
... View more
10-14-2014
11:14 AM
NPE reason is in wrong test initialization order. I'ev found the problem
... View more