Member since
02-16-2016
45
Posts
24
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6336 | 07-28-2016 03:37 PM | |
9179 | 02-20-2016 11:34 PM |
04-07-2016
07:59 PM
1 Kudo
I decided to use multiple streams instead. life is easier that way. thank u
... View more
04-07-2016
05:21 PM
yes the data is in the same stream. For example, one string will have 6 columns and the second one will have 8 . thank you I will try this see if it is gonna work.
... View more
04-07-2016
05:07 PM
thanks, it is really helpful for beginners like me.
... View more
04-07-2016
05:00 PM
In my data I have 8 different schemas. I want to create 8 different data frame for them and save them in 8 different tables in hive. So far I created a super bean class which holds shared attributes and each bean class extends it. Based on the type attribute I created different objects. The problem is I am unable to save them in different data frame. Is there any way I can do that? Here is my code so far, which works fine for one schema. xmlData.foreachRDD(
new Function2<JavaRDD<String>, Time, Void>() {
public Void call(JavaRDD<String> rdd, Time time) {
HiveContext hiveContext = JavaHiveContext.getInstance(rdd.context());
// Convert RDD[String] to RDD[case class] to DataFrame
JavaRDD<JavaRow> rowRDD = rdd.map(new Function<String, JavaRow>() {
public JavaRow call(String line) throws Exception{
String[] fields = line.split("\\|");
//JavaRow is my super class
JavaRow record = null;
if(fields[2].trim().equalsIgnoreCase("CDR")){
record = new GPRSClass(fields[0], fields[1]);
}
if(fields[2].trim().equalsIgnoreCase("Activation")){
record = new GbPdpContextActivation(fields[0], fields[1], fields[2], fields[3]); }
return record;}});
DataFrame df;
df = hiveContext.createDataFrame(rowRDD, JavaRow.class);
df.toDF().registerTempTable("Consumer");
System.out.println(df.count()+" ************Record Recived************");
df = hiveContext.createDataFrame(rowRDD, GPRSClass.class);
hiveContext.sql("CREATE TABLE if not exists gprs_data ( processor string, fileName string, type string, version string, id string )STORED AS ORC ");
df.save("/apps/hive/warehouse/data", "org.apache.spark.sql.hive.orc",SaveMode.Append);
}
return null; } });
... View more
Labels:
- Labels:
-
Apache Spark
04-01-2016
08:48 PM
Hue version has spark-submit. So there is not any way to do it in Huw 2.6? @Divakar Annapureddy
... View more
04-01-2016
08:25 PM
1 Kudo
I am new in Oozie. I am using Hue 2.6.1-2950 and Oozie 4.2. I develop a spark program in java which gets the data from kafka topic and save them in hive table. I pass my arguments to my .ksh script to submit the job. It works perfect however, I have no idea how to schedule this using oozie and hue to run every 5 minutes. I have a jar file which is my java code, I have a consumer.ksh which gets the arguments from my configuration file and run my jar file using spark-submit command. Please give me suggestion how to this.
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
04-01-2016
06:13 PM
2 Kudos
I am new in oozie. I have a java program which produce data into kafka topic(it is not map reduce job). I am trying to schedule it with ozzie. How ever, I am getting this error: JA009: Could not load history file hdfs://sandbox.hortonworks.com:8020/mr-history/tmp/hue/job_1459358290769_0012-1459533575025-hue-oozie%3Alauncher%3AT%3Djava%3AW%3DData+Producer%3AA%3DproduceDat-1459533591693-1-0-SUCCEEDED-default-1459533581542.jhist at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:349) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo. I read it can be permission or owner problem so, I changed the owner to mapred and give 777 permission. But I still I get the same error. I am using java action to schedule my jar file.
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
03-25-2016
03:45 PM
@Brandon Wilson I tried your suggestion it creates the hive table but I get this error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. and it does not load data into my table. do you have any idea how to solve this?
... View more
03-23-2016
02:30 PM
@Benjamin Leonhardi Thank you for your response. Based on your suggestion, I have to apply mapPartitions method on my JavaDStream . That method will return another JavaDStream to me. I cannot use saveAsTextFile() on the JavaDStream so I have to do foreachRDD to be able to do saveAsTextFile. Therefore, I will have the same problem again. correct me if I am wrong because I am new in spark.
... View more
03-21-2016
07:59 PM
2 Kudos
@Benjamin Leonhardi Do you have any sample code for java which use the mapPartition instead of foreachRDD ?
... View more