Member since
01-05-2015
38
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3964 | 09-29-2016 03:25 AM | |
1906 | 06-29-2016 04:34 AM | |
1654 | 06-29-2016 04:28 AM | |
9067 | 04-15-2016 12:32 AM |
09-29-2016
09:14 AM
Dear Colleagues, I've written a SparkStreaming which gets messages from Kafka and put to Hase. The job running fine in local mode but throwa an NullPointerException in yarn-cluster mode. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, cloudera01.mlslocal): java.lang.NullPointerException at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase.saveToHBase(MlsHadoopJobSparkStreamNewKafkaToHBase.java:163) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:140) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:113) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: The code snippet is the following: kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() {
public void call(final JavaPairRDD<String, byte[]> rdd) throws Exception {
rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() {
public void call(Tuple2<String, byte[]> avroRecord) throws Exception {
... View more
09-29-2016
06:12 AM
there is: [desktop] app_blacklist= [liboozie] oozie_url=http://<hostname>:11000/oozie which i added to get it working.
... View more
09-29-2016
06:07 AM
Hi, it works applying above configuration. But now i have a NullPointerException in my spark code (rdd.foreach): ... kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() {
public void call(JavaPairRDD<String, byte[]> rdd) throws Exception {
rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() {
public void call(Tuple2<String, byte[]> avroRecord) throws Exception { In local mode it works but not in yarn-cluster. Do you have any ideas in order to get it running? Best Regards, Butkiz
... View more
09-29-2016
03:25 AM
Hi, it works. I've added the following to "Hue Service → Configuration → Service-Wide → Advanced → Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini": [liboozie] oozie_url=http://<hostname>:11000/oozie
... View more
09-28-2016
06:21 AM
Dear Colleagues, I#m not able to use oozie in Hue, due to oozie_url (liboozie) is not correct (http:///oozie) . Oozie server is running and i've also restarted hue. But Hue don't grab oozie-server ( Oozie Editor/Dashboard The app won't work without a running Oozie server) If i set the url in "/run/cloudera-scm-agent/process/1122-hue-HUE_SERVER" and restart Hue, then a new file is created without changes. Do have any solution to set oozie_url or get the stuff working? Best regards, Sebastian
... View more
09-18-2016
01:03 AM
Dear Colleagues I want to store files up to 150 MB (10% of file set, other 90% of files are <10 MB) in HBase. Is ist possible or should i take another approach? Thanks in advance and best regards, Butkiz
... View more
07-12-2016
11:54 PM
Dear Colleages, I submitted a Spark Streaming job via Oozie and get the following error messages: Warning: Skip remote jar hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160405235854/oozie/oozie-sharelib-oozie.jar.
Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error...
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LeaseRenewer:hdfs@quickstart.cloudera:8020"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Executor task launch worker-2" Do you have an idea or a solution to prevent these error messages? Thanks in advance and best regards, butkiz
... View more
06-29-2016
04:34 AM
solved: I've created an additionaly static JavaSparkContext, convert the (String) object to JavaRDD (jsc.parallelize()) and insert into HBase using "saveAsNewAPIHadoopDataset(conf)".
... View more
06-29-2016
04:28 AM
I've solved this by adding more cpu cores local[*] and or run the job on cluster (with enough cpu core)
... View more
06-29-2016
12:20 AM
Dear Colleages, I want to insert a simple String into HBase using spark streaming. The insert should be done inside a rdd.forech function "public void call(Tuple2<String, byte[]> avroRecord)". Currently I'vo no idea. I'm able to point out the string values but not inserting into HBase. Thanks in advance and best regards, Sebastian
... View more
06-27-2016
09:21 AM
Dear Colleages, i want to stream kafka messages into HBase using spark streaming. i set up an java job and run it with "spak-submit ... yarb client". It seems to be thatkafka messages were consumed but no output is written. No error messages but the following: INFO storage.BlockManagerInfo: Added input-0-1467044417600 in memory on quickstart.cloudera:46453 (size: 99.0 B, free: 530.3 MB) 16/06/27 09:20:18 INFO scheduler.JobScheduler: Added jobs for time 1467044418000 ms Do you have any suggestions on this topic? Many thanks in advance, Butkiz
... View more
04-15-2016
12:40 AM
1 Kudo
Dear Colleages, I'm not able to change my community account email address. Can you tell me how does it works, please? Thanks in advance, Butkiz
... View more
04-15-2016
12:32 AM
Hi, the js was hanging on. After restart the browser all results and messages are displayed. Thanks al ot
... View more
04-14-2016
10:04 AM
I have updated from 5.4 to 5.5 and then to 5.6. I'll try 'Show tables'. Creating of tables works fine with the known message.
... View more
04-14-2016
10:00 AM
In have Updates from 5.4 to 5.5 and the to 5.5. I'l try 'Show tables'. Creating of tables Workshop finde with the know message.
... View more
04-14-2016
09:39 AM
Dear collegues, After upgrade to cdh 5.6 there is no hive query result displayed when i ran a hive query in hue. Also, no Protocol of the map reduce Job is displayed. The Job is market as succeded in Job browser. Do you habe an idea what the reason why? Tanks in advance, Butkiz
... View more
09-17-2015
01:38 AM
Thanks! I try to figure out which terms are related to one topic. Should i multiplice at first the V and S matrices and then compute the distance of the "new" vectors? Whats your understanding? Thanks and regards, butkiz
... View more
09-17-2015
01:01 AM
Dear Colleagues, In order to run a SSVD in mahout the documents were represented in a tfidf matrix using seq2sparse (the row-index are the doc-ids and the column-index are the dict-id (word-id)). The input for SSVD is these tfidf-matrix. The output of the SSVD job are the matrices U,S,V (transpose). How i can interprete this output regarding the original tfidf-matrix? Should i multiplice the original one with U, S or V? What is the conclusion? Thanks in advance and best regards, butkiz
... View more
Labels:
- Labels:
-
Mahout
09-02-2015
11:31 PM
Thanks a lot! The jar "mahout-1.0-collections.jar" was present in the class path. I've removed this jar and the job works without an error message. Best regards, butkiz!
... View more
09-02-2015
08:23 AM
Hello Colleagues, if i run a Mahout DistributedLanczosSolver Job the following error were thrown from the LanczosSolver : Exception in thread "main" java.lang.IncompatibleClassChangeError: class org.apache.mahout.math.decomposer.lanczos.LanczosSolver$Scale has interface org.apache.mahout.math.function.DoubleFunction as super class I'm using CDH 5.3.2 and the mahout 0.9 packages for CDH 5.3.2. If i started the job a map reduce job is was finished and the following log message were shown: INFO lanczos.LanczosSolver: 1 passes through the corpus so far... After this, the error message above occurs. Do you have any ideas or a soloution regarding this issue? Thanks and regards, butkiz
... View more
01-08-2015
03:34 AM
Hello, in order to install a cluster using the CM-Wizard with the private key method i get the following error message: No provider available for Unknown key file I used a custom user and select the respective .ppk file. If i logged in via ssh from a remote machine with the same credentials, it works. Do you have any ideas or solution? Thanks and Regards, butkiz
... View more
01-05-2015
04:57 AM
Hello, our operating system is Debian 8.0 and we would like to install Cloudera Manager succesfully using the INstaller. Currently the Installer throws an error because Debian 8.0 is not supported. Is there an opportunity to run the Installer on Debian 8? Best Regards, Sebastian
... View more