About TimothySpann

TimothySpann · ‎05-02-2016

that's strange, what JDK is on my machine

TimothySpann · ‎05-01-2016

is there anything in the logs?

TimothySpann · ‎04-29-2016

This is awesome. Very nice combination of tools. Do you have the Notebook and NIFI file in Github?

TimothySpann · ‎04-28-2016

Hmmm, I will talk to my friends at RedisLabs and see if they want to collaborate on it.

TimothySpann · ‎04-27-2016

Has anyone done anything using Redis for aggregates and sums in the flow or Redis as a source for NIFI?

TimothySpann · ‎04-27-2016

local master is not using YARN version of Spark. it's running a local version. Is that running? is the green connected light on in the right upper corner?

TimothySpann · ‎04-27-2016

is spark running in the cluster? is it on the default port. can you access Spark? can you get to the spark history UI

TimothySpann · ‎04-27-2016

Great sample code. In most of my Spark apps when working with Parquet, I have a few configurations that help. There are some SparkConfigurations that will help working with Parquet files. val sparkConf = new SparkConf() sparkConf.set("spark.sql.parquet.compression.codec", "snappy") sparkConf.set("spark.sql.parquet.mergeSchema", "true") sparkConf.set("spark.sql.parquet.binaryAsString", "true") sparkConf.set("spark.serializer", classOf[KryoSerializer].getName) sparkConf.set("spark.sql.tungsten.enabled", "true") sparkConf.set("spark.eventLog.enabled", "true") sparkConf.set("spark.io.compression.codec", "snappy") sparkConf.set("spark.rdd.compress", "true") sparkConf.set("spark.streaming.backpressure.enabled", "true") Some of the compression items are really important for different use cases. You also often need to turn them off or switch which codec you use depending on use case (batch, streaming, sql, large, small, many partitions, ...) EventLog enabled so you can look at how those parquet files are worked with in DAGs and metrics. Before you right some SparkSQL on that file, make sure you register a table name. If you don't want to do a write that will file if the directory/file already exists, you can choose Append mode to add to it. It depends on your use case. df1.registerTempTable("MyTableName") val results = sqlContext.sql("SELECT name FROM MyTableName") df1.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).parquet("data.parquet") If you want to look at the data from the command line after you write it, you can download parquet tools. This requires the Java JDK, git and Maven installed. git clone -b apache-parquet-1.8.0 https://github.com/apache/parquet-mr.git cd parquet-mr cd parquet-tools mvn clean package -Plocal

TimothySpann · ‎04-27-2016

To get the current AVRO Tools wget http://apache.claz.org/avro/avro-1.8.0/java/avro-tools-1.8.0.jar There's some good documentation here: https://avro.apache.org/docs/1.8.0/gettingstartedjava.html#Compiling+the+schema This article helped me: http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/

TimothySpann · ‎04-27-2016

Cool thanks. We have tried to do the kafka mirroring and that has had a lot of issues. I am thinking NIFI can solve alot of these problems. I think it's a matter of budget. How many nodes of NIFI an extra nodes to help process this data migrating over. A few people were thinking Dual Ingest, but that is hard to keep in sync usually. With NIFI, that should not be a problem. I wonder if someone has a DR example in NIFI worked up already?

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Heavy CPU usage by NiFi Java process

Re: Heavy CPU usage by NiFi Java process

Re: US Presidential Election: tweet analysis using...

Re: Redis with NIFI

Redis with NIFI

Re: Configuring Zeppelin Spark Interpreters

Re: Configuring Zeppelin Spark Interpreters

Re: Write / Read Parquet File in Spark

Re: NiFi: Applying an Avro Schema in ConvertCSVToA...

Re: Full Disaster Recovery with Multiple On-Premis...