About mak88

mak88 · ‎10-01-2019

Tried to delete this as noone seems to answer here. But if someone has a similiar issue, just UpdateAttribute on attributes you want to create a new flowfile and then pass them to AttributesToJSON processor.

mak88 · ‎10-01-2019

I have a complex flow where I created 3 attributes, [name, date, json_content] extracted from other flow file data that need to go into a database. How can I take these 3 attributes convert them to a new flow file with those columns? The schema I will use has those names. Schema: { "type": "record", "name": "mytable", "fields": [ {"name": "name","type": [ "string" ]}, {"name": "date","type": "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-millis" } ], "default" : null } }, {"name": "json_content","type": [ "string" ] }] }

mak88 · ‎09-25-2019

Can anyone at Cloudera/Horton aswer this??? Having same issues.

mak88 · ‎09-07-2017

I am running spark version 1.6.3 under HDP 2.5.6. What version of magellan should I use to run with this version?

mak88 · ‎05-03-2017

I need to upgrade a cluster from 2.4.2 to 2.5. Shouldn't we be following this link: http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-upgrade/bk_ambari-upgrade.pdf? We are going TO 2.5.

mak88 · ‎11-01-2016

At the moment I made it past this...unfortunately I added the extra options to the end of the command line and if you notice, the options "--master...memory 5g" are actually being fed into my jar. So I just moved the "Main.jar..." to the end and it works now. Corrrected cmd line: spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class Main --master yarn --executor-memory 5g --driver-memory 5g Main.jar --avro-file file_1.avro

mak88 · ‎10-31-2016

I am executing the following command: spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class Main Main.jar --avro-file file_1.avro --master yarn --executor-memory 5g --driver-memory 5g The file_1.avro file is about 1.5 Gb. But fails with files at 300 Mg as well. I have tried running this on both HDP with Spark 1.4.1 and Spark 1.6.1 and I get OOM error. Running from spark-shell works fine. Part of the huge stack trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191) at org.apache.avro.file.DeflateCodec.decompress(DeflateCodec.java:84) at org.apache.avro.file.DataFileStream$DataBlock.decompressUsing(DataFileStream.java:352) at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:199) at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:64) at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:32) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:250) ... I have compiled this with scala 2.10.6 with the following lines: sqlContext.read.format("com.databricks.spark.avro"). option("header", "true").load(aPath + iAvroFile); sqlContext.sql("CREATE TEMPORARY TABLE " + tempTable + " USING com.databricks.spark.avro OPTIONS " + "(path '" + aPath + iAvroFile + "')"); val counts_query = "SELECT id ID,count(id) " + "HitCount,'" + fileDate + "' DateHour FROM " + tempTable + " WHERE Format LIKE CONCAT('%','BEST','%') GROUP BY id"; val flight_counts = sqlContext.sql(counts_query); flight_counts.show() # OOM I have tried many options and get cannot get past this. Ex: --master yarn-client --executor-memory 10g --driver-memory 10g --num-executors 4 --executor-cores 4 Any ideas would help to get past this...

mak88 · ‎10-19-2016

We passed this problem by eventually just matching up one of the versions of the lucene library.

mak88 · ‎10-19-2016

We are using HDP with Flume 1.5.2.2.4 and attempting to get the Elasticsearch connector working. We installed elasticsearch-2.4.1.jar along with lucene-core-5.5.2.jar at first. When restarting flume we get java.lang.NoSuchFieldError: LUCENE_4_10_2. We get these NoSuchfieldErrors no matter which version of lucene we use: LUCENE_5_2_1, LUCENE_4_0_0... Can anyone shed some light on how to get elasticsearch libs to work with flume? Thanks Mike

mak88 · ‎09-20-2016

Almost forgot about this... I access my avro files like so: First as Tim said, include proper avro lib, in my case DataBricks. spark-submit --packages com.databricks:spark-avro_2.10:2.0.1 --class MyMain MyMain.jar val df = sqlContext.read.format("com.databricks.spark.avro"). option("header", "true").load("/user/user1/writer_test.avro") df.select("time").show() ... Thanks all

Online	Offline
Last Visited	‎12-16-2019 12:38 PM

Member Since	‎09-05-2016 02:43 AM
Last Visited	‎12-16-2019 12:38 PM
Posts	24
Kudos received	2

Cloudera Community

Re: Attributes to flow file

Re: Spark Heap error with large avro

Re: Flume with Elasticsearch noSuchFieldError

Re: Attributes to flow file

Attributes to flow file

Re: [NIFI] Cannot create PoolableConnectionFactory

Re: Magellan Failure in Zeppelin

Re: upgrade hdp 2.4 to 2.5

Re: Spark Heap error with large avro

Spark Heap error with large avro

Re: Flume with Elasticsearch noSuchFieldError

Flume with Elasticsearch noSuchFieldError

Re: How to access file in HDFS from Spark-shell or...