Member since
09-05-2016
24
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
898 | 10-01-2019 09:45 AM | |
1976 | 11-01-2016 06:47 PM | |
911 | 10-19-2016 05:07 PM |
10-01-2019
09:45 AM
Tried to delete this as noone seems to answer here. But if someone has a similiar issue, just UpdateAttribute on attributes you want to create a new flowfile and then pass them to AttributesToJSON processor.
... View more
10-01-2019
05:12 AM
I have a complex flow where I created 3 attributes, [name, date, json_content] extracted from other flow file data that need to go into a database. How can I take these 3 attributes convert them to a new flow file with those columns? The schema I will use has those names.
Schema: { "type": "record", "name": "mytable", "fields": [ {"name": "name","type": [ "string" ]}, {"name": "date","type": "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-millis" } ], "default" : null } }, {"name": "json_content","type": [ "string" ]
}] }
... View more
Labels:
- Labels:
-
Apache Hadoop
09-25-2019
10:10 AM
Can anyone at Cloudera/Horton aswer this??? Having same issues.
... View more
09-07-2017
08:05 PM
I am running spark version 1.6.3 under HDP 2.5.6. What version of magellan should I use to run with this version?
... View more
05-03-2017
06:15 PM
I need to upgrade a cluster from 2.4.2 to 2.5. Shouldn't we be following this link: http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-upgrade/bk_ambari-upgrade.pdf? We are going TO 2.5.
... View more
11-01-2016
06:47 PM
At the moment I made it past this...unfortunately I added the extra options to the end of the command line and if you notice, the options "--master...memory 5g" are actually being fed into my jar. So I just moved the "Main.jar..." to the end and it works now. Corrrected cmd line: spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class Main --master yarn --executor-memory 5g --driver-memory 5g Main.jar --avro-file file_1.avro
... View more
10-31-2016
02:30 PM
I am executing the following command: spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class Main Main.jar --avro-file file_1.avro --master yarn --executor-memory 5g --driver-memory 5g The file_1.avro file is about 1.5 Gb. But fails with files at 300 Mg as well. I have tried running this on both HDP with Spark 1.4.1 and Spark 1.6.1 and I get OOM error. Running from spark-shell works fine. Part of the huge stack trace: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
at org.apache.avro.file.DeflateCodec.decompress(DeflateCodec.java:84)
at org.apache.avro.file.DataFileStream$DataBlock.decompressUsing(DataFileStream.java:352)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:199)
at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:64)
at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:32)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:250)
... I have compiled this with scala 2.10.6 with the following lines: sqlContext.read.format("com.databricks.spark.avro").
option("header", "true").load(aPath + iAvroFile); sqlContext.sql("CREATE TEMPORARY TABLE " + tempTable + " USING com.databricks.spark.avro OPTIONS " +
"(path '" + aPath + iAvroFile + "')"); val counts_query = "SELECT id ID,count(id) " +
"HitCount,'" + fileDate + "' DateHour FROM " + tempTable + " WHERE Format LIKE CONCAT('%','BEST','%') GROUP BY id"; val flight_counts = sqlContext.sql(counts_query);
flight_counts.show() # OOM I have tried many options and get cannot get past this. Ex: --master yarn-client --executor-memory 10g --driver-memory 10g --num-executors 4 --executor-cores 4 Any ideas would help to get past this...
... View more
Labels:
- Labels:
-
Apache Spark
10-19-2016
05:07 PM
We passed this problem by eventually just matching up one of the versions of the lucene library.
... View more
10-19-2016
03:25 PM
We are using HDP with Flume 1.5.2.2.4 and attempting to get the Elasticsearch connector working. We installed elasticsearch-2.4.1.jar along with lucene-core-5.5.2.jar at first. When restarting flume we get java.lang.NoSuchFieldError: LUCENE_4_10_2.
We get these NoSuchfieldErrors no matter which version of lucene we use: LUCENE_5_2_1, LUCENE_4_0_0... Can anyone shed some light on how to get elasticsearch libs to work with flume? Thanks Mike
... View more
Labels:
- Labels:
-
Apache Flume
09-20-2016
02:20 AM
Almost forgot about this... I access my avro files like so: First as Tim said, include proper avro lib, in my case DataBricks. spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class MyMain MyMain.jar val df = sqlContext.read.format("com.databricks.spark.avro").
option("header", "true").load("/user/user1/writer_test.avro")
df.select("time").show() ... Thanks all
... View more