Member since
07-29-2013
162
Posts
8
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5674 | 05-06-2015 06:52 AM | |
2377 | 06-09-2014 10:51 PM | |
2855 | 01-30-2014 10:40 PM | |
2791 | 08-22-2013 12:28 AM | |
3839 | 08-18-2013 11:23 PM |
05-26-2015
01:54 AM
Hi, thanks for the reply I found these properties in HBase service configuration: Enable HBase Canary=true HBase Region Health Canary=true > (in Service Monitor) Didn't find there someting related to HBase
... View more
05-20-2015
01:35 AM
Hi, hbase-master says that I have 1300-1500 requests per second for the whole cluster Cloudera Manager 5.2.1 says that I have: 4500 read requests per second 500 write request per second. I see that graphic on HBase service page. Who Is lying?
... View more
Labels:
- Labels:
-
Apache HBase
05-06-2015
06:52 AM
Ha, looks like Camus runs local job runner, that is the problem... Need to inform camus that we have yarn here.
... View more
05-06-2015
05:59 AM
Thanks for the reply. There are several myctical problems: 1. Here is what ResourceManager conf says: http://my.resource.manager.ru:8088/conf <property> <name> mapred.child.java.opts </name> <value> -Xmx200m </value> <source> mapred-default.xml </source> </property> I can't find any mapred-default.xml, only inside hadoop-core.jar which is in cloudera parcels 3. Here is running app: Job configuration on NodeManerr UI says: mapred.child.java.opts -Xmx200m -Djava.net.preferIPv4Stack=true -Xmx9448718336 but ps -ef | grep java says: yarn 54070 53908 99 15:55 ? 00:08:20 /usr/java/jdk1.7.0_55/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar camus-tool.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.properties NOw we get xmx as 1000m, which is still not enough, but we don't have such property...
... View more
05-06-2015
04:05 AM
Hi, I'm running mapreduce job using hadoop jar command The problem is that hadoop-core.jar contains mapred-default.xml with -Xmx200m for mapreduce. I have correct client conf in /etc/hadoop/conf/mapred-site.xml xmx is big enough there. When job started, mapred.child.java.opts -Xmx200m -Djava.net.preferIPv4Stack=true -Xmx9448718336 property is merged. -Xmx200m comes from bundled mapred-default.xml and -Djava.net.preferIPv4Stack=true -Xmx9448718336 comes from my config. Job uses -Xmx200m for mappres and fails What is the right way to exclude -Xmx200m and leave only -Xmx9448718336 from mapred-site.xml?
... View more
- Tags:
- mapred-defaul.xml
Labels:
04-29-2015
12:50 PM
Thanks for reply. I read this note: http://archive.cloudera.com/cdh5/cdh/5/spark-1.3.0-cdh5.4.0.releasenotes.html We are using jobserver to submit jobs for eavry 5 mins, and that issue stated " This is a blocker for 24/7 running applications like Spark Streaming apps. " Jobserver holds a job context to share context among other jobs? Am I right?
... View more
04-29-2015
03:44 AM
Hi, does some cloudera distirbution provides https://issues.apache.org/jira/browse/SPARK-5967 backport? Can't find it, even in CDH 5.4
... View more
Labels:
- Labels:
-
Apache Spark
02-15-2015
05:54 AM
Hi, sorry, no luck. Still suffering from MR2/YARN. I have no idea how it works. Right now I'm getting deadlock several times a day. I have single user which submits jpb. It has huge pool (32*8 mem and 4*CPU) and It has limit for 8 applications at once. Suddenly everything stops. What does it mean? Who can I get the idea of what's went wrong?
... View more
11-27-2014
02:09 AM
I've took org.apache.flume.sink.solr.
morphline.UUIDInterceptor$Builder as an example My custom interceptor takes event body and stores it in event header. Then SolrSink takes this header by default and sendt it to Solr for indexing. it works NB: solr schema.xml should have matching field declaration.
... View more
11-25-2014
06:15 AM
Result is the same I do get contents of expression {$entry/../../body}, but all tags are removed. Imagine I had <body> <inner>inner text </inner> </body> I get "inner text" as a result. I want to get <inner>inner text </inner> without removed tag
... View more
11-24-2014
02:14 PM
Hi, I've tried this: return
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
<body>{$entry/../../ecol:body}</body>
</entry> And that: return
<record>
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
<body>{$entry/../../ecol:body}</body>
</entry>
</record> Nothing helps. Looks like I don't understand the idea and how it works. <body>{$entry/../../ecol:body}</body> is extracted, but still ALL tags under <ecol:body/> are moved. What do I do wrong?
... View more
11-18-2014
01:46 AM
Cool, thanks! I'll try this evening.
... View more
11-15-2014
07:16 AM
Hi, I use morphline to parse incomming xml and store it to Solr. The problem is that morphline removes all tags. I need to store to Solr a subtree from incomming XML. Example: <ecol:body>
<out:StatusMessage xmlns:out="http://lol.ru/coordinate/v5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<out:ResponseDate xsi:nil="true"/>
<out:PlanDate xsi:nil="true"/>
<out:StatusCode>1040</out:StatusCode>
<out:Responsible>
<out:LastName>XXXX</out:LastName>
<out:FirstName>YY</out:FirstName>
</out:Responsible>
<out:Note/>
<out:ServiceNumber>123123123</out:ServiceNumber>
</out:StatusMessage>
</ecol:body> A part of my morphline config: return
<entry>
{$entry/attr:ssoId}
{$entry/attr:applicationId}
{$entry/../../ecol:body}
</entry> a valaue for a <ecol:body> has: 1040 \n XXXX \n YY 12312312123 and ALL tags are removed. I want to leave tags. Is there anypossibility to do that?
... View more
Labels:
- Labels:
-
Apache Solr
11-04-2014
04:25 AM
Hi, I'm trying to switch to CDH5 🙂 I have several nodes each as 32GB ram. Here is my confusion: 1. How to control tasks concurrency? Really, I don't any pools/schedulers right now. I need one huge pool. I don't need any complicity right now, since there is no reason for it. How can I force run all my MR jobs in one huge pool? 2. How to contol memory allocation? I've used mapred.child.java.opts in MR1, in MR2 it doesn't work. 3. I see 100500 setting related to *.memory.mb *.memory.max. I've read description and don't see how they fluence on my MR jobs. 4. Resource manager shows 9 node managers (NM). Each NM has 8Vcores (it's ok, there is 8 HT cores on each node) and 8GB RAM Why 8GB RAM? I have 32GB per node? How can I change it?
... View more
Labels:
- Labels:
-
Apache YARN
10-20-2014
04:04 PM
Hi, what kind of source you use to get attachment_body? Is it possible to use HttpSource as a source and morphlinesolrsink to process accepted payload (xml data)
... View more
10-15-2014
08:18 AM
Hi, looks like it's a bug, your reply created new sperate forum thread. I see the defference. EXample from CDK tests return: text="sample tweet one" sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one" And mine doesn't return someName=someValue, only someValue: 431 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet Twitter-based example takes name "text" becuase it's attribute name from xml? I've refactored my xq: queryString : """
for $entry in /collectorEvent/attributes/etpEventCollectorAttributes
return
<entry>
{$entry/ssoId}
{$entry/applicationId}
</entry>
""" now it returns: {applicationId=[123], ssoId=[someSSO_id} Thanks, it works!
... View more
10-14-2014
02:45 PM
Hi, I've met strange thing. Here is good config: # Copyright 2013 Cloudera Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/tweets/tweet/@text" # each item in result sequence becomes a morphline record
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] here is it's partial output: 1008 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one"
1015 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - beforeProcess: {id=[123], text=[sample tweet one]}
1015 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record: [{id=[123], text=[sample tweet one]}]
38023 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #2 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet two"
38025 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - beforeProcess: {id=[123], text=[sample tweet two]}
38025 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record: [{id=[123], text=[sample tweet two]}] Here is "bad" config, only xquery command is executed: morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId/text()"
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] I clearly see in logs that first config runs w/o any problems, this config is taken from SaxonMorphlineTest The second config executes w/o any problems but logDebug is not working. Here is output: 431 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet
collector.getRecords()[] Here is what I see in Idea. Why does idea "highlights" "xquery" in good config? I'm on linux, I don't see any "bad chars" in text editor.
... View more
10-14-2014
11:30 AM
Thanks for your patience,
... View more
10-14-2014
11:14 AM
NPE reason is in wrong test initialization order. I'ev found the problem
... View more
10-14-2014
11:09 AM
Ok, so if I migrate to CDH5 I have to refactor my morpflines.conf? This config works for cdk morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId"
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] and fails for kite with exception: org.kitesdk.morphline.api.MorphlineCompilationException: No command builder registered for name: xquery near: {
# target/test-classes/morphlines/dummy-xml.conf: 8
"xquery" : {
# target/test-classes/morphlines/dummy-xml.conf: 9
"fragments" : [
# target/test-classes/morphlines/dummy-xml.conf: 10
{
# target/test-classes/morphlines/dummy-xml.conf: 12
"queryString" : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId",
# target/test-classes/morphlines/dummy-xml.conf: 11
"fragmentPath" : "/"
}
]
}
} here is my kite-based test: import org.junit.Test
import org.kitesdk.morphline.api.AbstractMorphlineTest
import org.kitesdk.morphline.api.Record
import org.kitesdk.morphline.base.Fields
/**
* User: sergey.sheypak
* Date: 13.10.14
* Time: 20:30
*/
class ParseDummyXmlTest extends AbstractMorphlineTest {
@Test
void testParseDummyXml(){
morphline = createMorphline('morphlines/dummy-xml');
def record = new Record()
record.put(Fields.ATTACHMENT_BODY, readDummyXml());
processAndVerifySuccess(record, null);
}
InputStream readDummyXml(){
this.class.classLoader.getResourceAsStream('dummy.xml')
}
private void processAndVerifySuccess(Record input, Record expected) {
collector.reset();
startSession();
morphline.process(input)
collector.getFirstRecord()
}
} my kite dependencies are: <dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-morphlines-all</artifactId>
<version>0.17.0</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-morphlines-core</artifactId>
<type>test-jar</type>
<scope>test</scope>
<version>0.17.0</version>
</dependency>
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-morphlines-saxon</artifactId>
<version>0.17.0</version>
</dependency>
... View more
10-14-2014
10:28 AM
Hi! I've passed this guide: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_deploy_solr_sink_flume_agent.html I've spent some time to make it work from Cloudera Manager nd it really works. What confuses me: Here are imports from tutorial # Import all morphline commands in these java packages and their subpackages.
# Other commands that may be present on the classpath are not visible to this morphline.
importCommands : ["com.cloudera.**", "org.apache.solr.**"] and kite has diffrent configuration in it's examples...It looks even more complicated than CDK example-tutorial.
... View more
10-14-2014
09:53 AM
Hi, I want to catch xml payload using flume and use morphlines to put parsed data to solr. Now I have a deep misunderstanding. What do I have to use cdk-morphlines or kite? I have a config: morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.**"]
commands : [
{
xquery {
fragments : [
{
fragmentPath : "/"
queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId"
}
]
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
] it runs for cdk and doesn't work for kite environment. I do get an exception while trying to run test: org.kitesdk.morphline.api.MorphlineCompilationException: No command builder registered for name: xquery near: { # target/test-classes/morphlines/dummy-xml.conf: 8 "xquery" : { # target/test-classes/morphlines/dummy-xml.conf: 9 "fragments" : [ # target/test-classes/morphlines/dummy-xml.conf: 10 { # target/test-classes/morphlines/dummy-xml.conf: 12 "queryString" : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId", # target/test-classes/morphlines/dummy-xml.conf: 11 "fragmentPath" : "/" } ] } } why? and what would work with flume?
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Solr
10-14-2014
09:37 AM
Is there any possibility to contribute to project? It would be great to decouple "test basement". I see these major problems: 1. tightly coupled with junit 2. i have to download dozens of deps to make it run 3. protected static final java.lang.String RESOURCES_DIR = "target/test-classes"; forces me to put configs under test resource. What is the reason to hardcode it. I do get java.io.FileNotFoundException: File not found: target/test-classes/dummy-xml.conf while trying to run my test 😞 I did put config to desired place then it just throws NPE java.lang.NullPointerException: null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187) at org.kitesdk.morphline.base.AbstractCommand.<init>(AbstractCommand.java:71) at org.kitesdk.morphline.stdlib.Pipe.<init>(Pipe.java:38) at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40) checkNotNull what? Not to much info to make it work 😞
... View more
10-14-2014
01:04 AM
Oh, I've used wrong artifact, here is the right with <type> test-jar </type> : Thanks! <dependency> <groupId> org.kitesdk </groupId> <artifactId> kite-morphlines-core </artifactId> <type> test-jar </type> <scope> test </scope> <version> ${kite-version} </version> </dependency>
... View more
10-13-2014
01:26 PM
Hi, thanks for the reply. It really looks like more "debug", than "test". I do expect something like: //groovy-like pseudocode using hamcrest @Test void testParseSmthUsingMorphline(){ def aResult = doSomeTrickyStuff('a_path_to_morphline_config', 'a_path_to_input_dataset') assertThat(result, hasSize(3)) assertThat(result.get(0).get('myProperty'), equalTo('some cool value')) } P.S. Please add code highlighting!
... View more
10-13-2014
10:38 AM
Hi, I've seen this: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/java/com/cloudera/cdk/morphline/api/MorphlineDemo.java And I have no idea how to get access to parsed records. I've seen this: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java and I can't use it, because 1. it uses junit 2. because can't get access to com.cloudera.cdk.morphline.api.Collector, I don't see where artifact with classifier "test" is published. What are the right approaches?
... View more
06-09-2014
10:51 PM
2 Kudos
Vikram Srivastava helped me in google groups. Here is an explanation: The alternatives priority for HDFS is by default configured lower than MapReduce, so deploying HDFS client configs only will not update what /etc/hadoop/conf points to. I've filed an internal issue for this to warn users that they need to deploy cluster client configs rather than individual services. Hope it would help other hadoopers 🙂
... View more
06-09-2014
04:08 AM
This problem is related only to HDFS service. I did deploy client conf of MapReduce service. It updates client conf mapred-site.xml and hdfs-site.xml I do see updated hdfs-site.xml The other problem with HDFS service is that i can't DELETE any role (DN, Gateway, Journal node). Cloudera manager just starts to consume 100% cPU and jstack reports therad dead lock...
... View more
06-09-2014
03:31 AM
Hi, we have NN HA on quorum Journal. We got failed namenode recently. We did replace it with new one. HDFS works, it's possible to read/write data. I do click 'download clinet configuration' and see that hdfs-site.xml has right settings for NN HA service configuration. When I click 'deploy client configuration' nohing happens. /etc/hadoop/conf/hdfs-site.xml still have old configuration. It refenreces deleted NN role. last modified time is not changed also. Looks like it's not updated by CM... How can we fix it?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Cloudera Manager
-
HDFS