About TimothySpann

TimothySpann · ‎03-15-2018

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/spark-dataframe-api.html http://spark.apache.org/docs/2.2.0/ http://spark.apache.org/docs/2.2.0/rdd-programming-guide.html http://spark.apache.org/docs/2.2.0/quick-start.html https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "/apps/example.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "hdfs://server:8020/apps/example_2.11-1.0.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches FYI 18/03/14 11:54:54 INFO LineBufferedStream: stdout: 18/03/14 11:54:54 INFO Client: Source and destination file systems are the same. Not copying hdfs://server:8020/opt/demo/example.jar

TimothySpann · ‎03-15-2018

that would definitely work, but they are not open source and not free. Any suggestions for open source?

TimothySpann · ‎03-15-2018

Easy to run this code from Spark Shell as well without connection to nifi runshell.sh /usr/hdp/current/spark2-client/bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /opt/demo/example.jar

TimothySpann · ‎03-14-2018

1. A special utility? 2. NiFI: Load Table to ORC with PutHDFS, PutHiveQL Merge with ACID Table 3. SQOOP? 4. NiFI: PutHiveStreaming https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge 5. NiFi: To Druid, Insert into hive acid table from table ontop of Druid 6. NiFi to HBase, Hive table on hbase insert into Hive Acid Table 7. Some SnappyData in memory pattern? 8. IBM BigSQL? 9. Attunity?

TimothySpann · ‎03-14-2018

What: Executing Scala Apache Spark Code in JARS from Apache NiFi Why: You don't want all of your scala code in a continuous block like Apache Zeppelin Tools: Apache NiFi - Apache Livy - Apache Spark - Scala Flows: Option 1: Inline Scala Code Apache Zeppelin Running the Same Scala Job (have to add the jar to interpreter for Spark and restart) Grafana Charts of Apache NiFi Run Log Search Helps You Find Errors Run Code For Your Spark Class Setting up Your ExecuteSparkInteractive Processor Setting Up Your Spark Service for Scala Tracking the Job in Livy UI Tracking the Job in Spark UI I was looking at doing this: Pull code from Git and put into a NiFi attribute, run directly. For bigger projects, you will have many classes and dependencies that may require a full IDE and SBT build cycle. Once I build a Scala jar I want to run against that. Example Code package com.dataflowdeveloper.example import org.apache.spark.sql.SparkSession class Example () { def run( spark: SparkSession) { try { println("Started") val shdf = spark.read.json("hdfs://princeton0.field.hortonworks.com:8020/smartPlugRaw") shdf.printSchema() shdf.createOrReplaceTempView("smartplug") val stuffdf = spark.sql("SELECT * FROM smartplug") stuffdf.count() println("Complete.") } catch { case e: Exception => e.printStackTrace(); } } } =--- Run that with import com.dataflowdeveloper.example.Example println("Before run") val job = new Example() job.run(spark) println("After run") === after run {"text\/plain":"After run"} Import Tip You need to put your Jar in Session.jars on the Session control and on the same directory on hdfs. So I did /opt/demo/example.jar in Linux and hdfs:// /opt/demo/example.jar. Make sure Livy and NiFi have read permissions on those. Github: https://github.com/tspannhw/livysparkjob Github Release: https://github.com/tspannhw/livysparkjob/releases/tag/v1.1 Apache NiFi Flow Example spark-it-up-scala.xml

TimothySpann · ‎03-14-2018

You have to make sure you install the libraries on the server for the correct python version. I always run with executeprocess or executestreamcommand and wrap my python in a shell

TimothySpann · ‎03-14-2018

Yes, much safer to have another instance you can use for reporting and such. Even if it's just one node.

TimothySpann · ‎03-14-2018

NiFi 1.1 is ancient. I would not recommend running that old stuff. These are all working: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_release-notes/content/ch_hdf_relnotes.html#repo-location You can download NiFi from: http://public-repo-1.hortonworks.com/HDF/3.1.1.0/nifi-1.5.0.3.1.1.0-35-bin.zip http://public-repo-1.hortonworks.com/HDF/3.1.1.0/nifi-1.5.0.3.1.1.0-35-bin.tar.gz

TimothySpann · ‎03-13-2018

sure. lots of people do that one. like 4 processors. you can cluster it for lots of files. make sure you have fast disks and a fast network those are your bottle necks.

TimothySpann · ‎03-13-2018

What version of Apache Hive? What version of HDP?

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Apache Livy - Apache NiFi - Apache Spark : Ex...

Re: What is the fastest way to load data into Apac...

Re: Apache Livy - Apache NiFi - Apache Spark : Ex...

What is the fastest way to load data into Apache H...

Apache Livy - Apache NiFi - Apache Spark : Execut...

Re: cannot use numpy or scipy in python in nifi ex...

Re: Monitor Apache NiFi with Apache NiFi

Re: NiFi install failure. HDP 2.6.4.0 and HDF 3.1...

Re: Nifi for batch ingest

Re: Hive Left Join with Inner join giving wrong re...