About hoda_moradi2014

hoda_moradi2014 · ‎04-07-2016

I decided to use multiple streams instead. life is easier that way. thank u

hoda_moradi2014 · ‎04-07-2016

yes the data is in the same stream. For example, one string will have 6 columns and the second one will have 8 . thank you I will try this see if it is gonna work.

hoda_moradi2014 · ‎04-07-2016

thanks, it is really helpful for beginners like me.

hoda_moradi2014 · ‎04-07-2016

In my data I have 8 different schemas. I want to create 8 different data frame for them and save them in 8 different tables in hive. So far I created a super bean class which holds shared attributes and each bean class extends it. Based on the type attribute I created different objects. The problem is I am unable to save them in different data frame. Is there any way I can do that? Here is my code so far, which works fine for one schema. xmlData.foreachRDD( new Function2<JavaRDD<String>, Time, Void>() { public Void call(JavaRDD<String> rdd, Time time) { HiveContext hiveContext = JavaHiveContext.getInstance(rdd.context()); // Convert RDD[String] to RDD[case class] to DataFrame JavaRDD<JavaRow> rowRDD = rdd.map(new Function<String, JavaRow>() { public JavaRow call(String line) throws Exception{ String[] fields = line.split("\\|"); //JavaRow is my super class JavaRow record = null; if(fields[2].trim().equalsIgnoreCase("CDR")){ record = new GPRSClass(fields[0], fields[1]); } if(fields[2].trim().equalsIgnoreCase("Activation")){ record = new GbPdpContextActivation(fields[0], fields[1], fields[2], fields[3]); } return record;}}); DataFrame df; df = hiveContext.createDataFrame(rowRDD, JavaRow.class); df.toDF().registerTempTable("Consumer"); System.out.println(df.count()+" ************Record Recived************"); df = hiveContext.createDataFrame(rowRDD, GPRSClass.class); hiveContext.sql("CREATE TABLE if not exists gprs_data ( processor string, fileName string, type string, version string, id string )STORED AS ORC "); df.save("/apps/hive/warehouse/data", "org.apache.spark.sql.hive.orc",SaveMode.Append); } return null; } });

hoda_moradi2014 · ‎04-01-2016

Hue version has spark-submit. So there is not any way to do it in Huw 2.6? @Divakar Annapureddy

hoda_moradi2014 · ‎04-01-2016

I am new in Oozie. I am using Hue 2.6.1-2950 and Oozie 4.2. I develop a spark program in java which gets the data from kafka topic and save them in hive table. I pass my arguments to my .ksh script to submit the job. It works perfect however, I have no idea how to schedule this using oozie and hue to run every 5 minutes. I have a jar file which is my java code, I have a consumer.ksh which gets the arguments from my configuration file and run my jar file using spark-submit command. Please give me suggestion how to this.

hoda_moradi2014 · ‎04-01-2016

I am new in oozie. I have a java program which produce data into kafka topic(it is not map reduce job). I am trying to schedule it with ozzie. How ever, I am getting this error: JA009: Could not load history file hdfs://sandbox.hortonworks.com:8020/mr-history/tmp/hue/job_1459358290769_0012-1459533575025-hue-oozie%3Alauncher%3AT%3Djava%3AW%3DData+Producer%3AA%3DproduceDat-1459533591693-1-0-SUCCEEDED-default-1459533581542.jhist at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:349) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo. I read it can be permission or owner problem so, I changed the owner to mapred and give 777 permission. But I still I get the same error. I am using java action to schedule my jar file.

hoda_moradi2014 · ‎03-25-2016

@Brandon Wilson I tried your suggestion it creates the hive table but I get this error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. and it does not load data into my table. do you have any idea how to solve this?

hoda_moradi2014 · ‎03-23-2016

@Benjamin Leonhardi Thank you for your response. Based on your suggestion, I have to apply mapPartitions method on my JavaDStream . That method will return another JavaDStream to me. I cannot use saveAsTextFile() on the JavaDStream so I have to do foreachRDD to be able to do saveAsTextFile. Therefore, I will have the same problem again. correct me if I am wrong because I am new in spark.

hoda_moradi2014 · ‎03-21-2016

@Benjamin Leonhardi Do you have any sample code for java which use the mapPartition instead of foreachRDD ?

Online	Offline
Last Visited	‎02-09-2019 04:44 AM

Member Since	‎02-16-2016 04:51 PM
Last Visited	‎02-09-2019 04:44 AM
Posts	45
Kudos received	24

Cloudera Community

Re: Main class [org.apache.oozie.action.hadoop.Hiv...

Re: I just start learning kafka and spark. I am tr...

Re: Create different schemas at run time for diff...

Re: Create different schemas at run time for diff...

Re: scheduling a spark-submit job using oozie

Create different schemas at run time for differen...

Re: scheduling a spark-submit job using oozie

scheduling a spark-submit job using oozie

How to resolve JA009: Could not load history file ...

Re: How do I create an ORC Hive table from Spark?

Re: How to save all the output of spark sql query ...

Re: How to save all the output of spark sql query ...