About butkiz

butkiz · ‎09-29-2016

Dear Colleagues, I've written a SparkStreaming which gets messages from Kafka and put to Hase. The job running fine in local mode but throwa an NullPointerException in yarn-cluster mode. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, cloudera01.mlslocal): java.lang.NullPointerException at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase.saveToHBase(MlsHadoopJobSparkStreamNewKafkaToHBase.java:163) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:140) at com.mls.hadoop.messages.kafka.consumer.MlsHadoopJobSparkStreamNewKafkaToHBase$1$1.call(MlsHadoopJobSparkStreamNewKafkaToHBase.java:113) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: The code snippet is the following: kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() { public void call(final JavaPairRDD<String, byte[]> rdd) throws Exception { rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() { public void call(Tuple2<String, byte[]> avroRecord) throws Exception {

butkiz · ‎09-29-2016

there is: [desktop] app_blacklist= [liboozie] oozie_url=http://<hostname>:11000/oozie which i added to get it working.

butkiz · ‎09-29-2016

Hi, it works applying above configuration. But now i have a NullPointerException in my spark code (rdd.foreach): ... kafkaStream.foreachRDD(new VoidFunction<JavaPairRDD<String, byte[]>>() { public void call(JavaPairRDD<String, byte[]> rdd) throws Exception { rdd.foreach(new VoidFunction<Tuple2<String, byte[]>>() { public void call(Tuple2<String, byte[]> avroRecord) throws Exception { In local mode it works but not in yarn-cluster. Do you have any ideas in order to get it running? Best Regards, Butkiz

butkiz · ‎09-29-2016

Hi, it works. I've added the following to "Hue Service → Configuration → Service-Wide → Advanced → Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini": [liboozie] oozie_url=http://<hostname>:11000/oozie

butkiz · ‎09-28-2016

Dear Colleagues, I#m not able to use oozie in Hue, due to oozie_url (liboozie) is not correct (http:///oozie) . Oozie server is running and i've also restarted hue. But Hue don't grab oozie-server ( Oozie Editor/Dashboard The app won't work without a running Oozie server) If i set the url in "/run/cloudera-scm-agent/process/1122-hue-HUE_SERVER" and restart Hue, then a new file is created without changes. Do have any solution to set oozie_url or get the stuff working? Best regards, Sebastian

butkiz · ‎09-18-2016

Dear Colleagues I want to store files up to 150 MB (10% of file set, other 90% of files are <10 MB) in HBase. Is ist possible or should i take another approach? Thanks in advance and best regards, Butkiz

butkiz · ‎07-12-2016

Dear Colleages, I submitted a Spark Streaming job via Oozie and get the following error messages: Warning: Skip remote jar hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160405235854/oozie/oozie-sharelib-oozie.jar. Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LeaseRenewer:hdfs@quickstart.cloudera:8020" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Executor task launch worker-2" Do you have an idea or a solution to prevent these error messages? Thanks in advance and best regards, butkiz

butkiz · ‎06-29-2016

solved: I've created an additionaly static JavaSparkContext, convert the (String) object to JavaRDD (jsc.parallelize()) and insert into HBase using "saveAsNewAPIHadoopDataset(conf)".

butkiz · ‎06-29-2016

I've solved this by adding more cpu cores local[*] and or run the job on cluster (with enough cpu core)

butkiz · ‎06-29-2016

Dear Colleages, I want to insert a simple String into HBase using spark streaming. The insert should be done inside a rdd.forech function "public void call(Tuple2<String, byte[]> avroRecord)". Currently I'vo no idea. I'm able to point out the string values but not inserting into HBase. Thanks in advance and best regards, Sebastian

Online	Offline
Last Visited	‎03-31-2017 01:37 AM

Member Since	‎01-05-2015 04:51 AM
Last Visited	‎03-31-2017 01:37 AM
Posts	38
Kudos received	2

Cloudera Community

Re: Hue does not know about oozie

Re: Spark Streaming insert String into HBase

Re: Spark streaming is not writing auto HBase

Re: No hive query result is displayed in hue after...

SparkStreaming nullPointerException on rdd.foreach...

Re: Hue does not know about oozie

Re: Spark Streaming - out of memory when submit us...

Re: Hue does not know about oozie

Hue does not know about oozie

HBase cell size (files)

Spark Streaming - out of memory when submit using ...

Re: Spark Streaming insert String into HBase

Re: Spark streaming is not writing auto HBase

Spark Streaming insert String into HBase