Member since
01-05-2015
38
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5545 | 09-29-2016 03:25 AM | |
2619 | 06-29-2016 04:34 AM | |
2191 | 06-29-2016 04:28 AM | |
11531 | 04-15-2016 12:32 AM |
09-29-2016
09:14 AM
Dear Colleagues, I've written a SparkStreaming which gets messages from Kafka and put to Hase. The job running fine in local mode but throwa an NullPointerException in yarn-cluster mode. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, cloudera01.mlslocal): java.lang.NullPointerException at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase.saveToHBase(MlsHadoopJobSparkStreamNewKafkaToHBase.java:163) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:140) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:113) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: The code snippet is the following: kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() {
public void call(final JavaPairRDD<String, byte[]> rdd) throws Exception {
rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() {
public void call(Tuple2<String, byte[]> avroRecord) throws Exception {
... View more
Labels:
09-29-2016
06:12 AM
there is: [desktop] app_blacklist= [liboozie] oozie_url=http://<hostname>:11000/oozie which i added to get it working.
... View more
09-29-2016
06:07 AM
Hi, it works applying above configuration. But now i have a NullPointerException in my spark code (rdd.foreach): ... kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() {
public void call(JavaPairRDD<String, byte[]> rdd) throws Exception {
rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() {
public void call(Tuple2<String, byte[]> avroRecord) throws Exception { In local mode it works but not in yarn-cluster. Do you have any ideas in order to get it running? Best Regards, Butkiz
... View more
09-29-2016
03:25 AM
Hi, it works. I've added the following to "Hue Service → Configuration → Service-Wide → Advanced → Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini": [liboozie] oozie_url=http://<hostname>:11000/oozie
... View more
09-28-2016
06:21 AM
Dear Colleagues, I#m not able to use oozie in Hue, due to oozie_url (liboozie) is not correct (http:///oozie) . Oozie server is running and i've also restarted hue. But Hue don't grab oozie-server ( Oozie Editor/Dashboard The app won't work without a running Oozie server) If i set the url in "/run/cloudera-scm-agent/process/1122-hue-HUE_SERVER" and restart Hue, then a new file is created without changes. Do have any solution to set oozie_url or get the stuff working? Best regards, Sebastian
... View more
Labels:
- Labels:
-
Apache Oozie
-
Cloudera Hue
09-18-2016
01:03 AM
Dear Colleagues I want to store files up to 150 MB (10% of file set, other 90% of files are <10 MB) in HBase. Is ist possible or should i take another approach? Thanks in advance and best regards, Butkiz
... View more
Labels:
- Labels:
-
Apache HBase
07-12-2016
11:54 PM
Dear Colleages, I submitted a Spark Streaming job via Oozie and get the following error messages: Warning: Skip remote jar hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160405235854/oozie/oozie-sharelib-oozie.jar.
Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error...
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LeaseRenewer:hdfs@quickstart.cloudera:8020"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Executor task launch worker-2" Do you have an idea or a solution to prevent these error messages? Thanks in advance and best regards, butkiz
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
HDFS
06-29-2016
04:34 AM
solved: I've created an additionaly static JavaSparkContext, convert the (String) object to JavaRDD (jsc.parallelize()) and insert into HBase using "saveAsNewAPIHadoopDataset(conf)".
... View more
06-29-2016
04:28 AM
I've solved this by adding more cpu cores local[*] and or run the job on cluster (with enough cpu core)
... View more
06-29-2016
12:20 AM
Dear Colleages, I want to insert a simple String into HBase using spark streaming. The insert should be done inside a rdd.forech function "public void call(Tuple2<String, byte[]> avroRecord)". Currently I'vo no idea. I'm able to point out the string values but not inserting into HBase. Thanks in advance and best regards, Sebastian
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark