1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1913 | 04-03-2024 06:39 AM | |
| 3010 | 01-12-2024 08:19 AM | |
| 1642 | 12-07-2023 01:49 PM | |
| 2419 | 08-02-2023 07:30 AM | |
| 3355 | 03-29-2023 01:22 PM |
03-15-2018
01:30 PM
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/spark-dataframe-api.html
http://spark.apache.org/docs/2.2.0/ http://spark.apache.org/docs/2.2.0/rdd-programming-guide.html http://spark.apache.org/docs/2.2.0/quick-start.html https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "/apps/example.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches curl -H "Content-Type: application/json" -H "X-Requested-By: admin" -X POST -d '{"file": "hdfs://server:8020/apps/example_2.11-1.0.jar","className": "com.dataflowdeveloper.example.Links"}' http://server:8999/batches FYI 18/03/14 11:54:54 INFO LineBufferedStream: stdout: 18/03/14 11:54:54 INFO Client: Source and destination file systems are the same. Not copying hdfs://server:8020/opt/demo/example.jar
... View more
03-15-2018
12:16 PM
that would definitely work, but they are not open source and not free. Any suggestions for open source?
... View more
03-15-2018
01:07 AM
Easy to run this code from Spark Shell as well without connection to nifi runshell.sh
/usr/hdp/current/spark2-client/bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /opt/demo/example.jar
... View more
03-14-2018
11:33 PM
1 Kudo
1. A special utility? 2. NiFI: Load Table to ORC with PutHDFS, PutHiveQL Merge with ACID Table 3. SQOOP? 4. NiFI: PutHiveStreaming https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge 5. NiFi: To Druid, Insert into hive acid table from table ontop of Druid 6. NiFi to HBase, Hive table on hbase insert into Hive Acid Table 7. Some SnappyData in memory pattern? 8. IBM BigSQL? 9. Attunity?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
03-14-2018
09:20 PM
3 Kudos
What: Executing Scala Apache Spark Code in JARS from Apache NiFi Why: You don't want all of your scala code in a continuous block like Apache Zeppelin Tools: Apache NiFi - Apache Livy - Apache Spark - Scala Flows: Option 1: Inline Scala Code Apache Zeppelin Running the Same Scala Job (have to add the jar to interpreter for Spark and restart) Grafana Charts of Apache NiFi Run Log Search Helps You Find Errors Run Code For Your Spark Class Setting up Your ExecuteSparkInteractive Processor Setting Up Your Spark Service for Scala Tracking the Job in Livy UI Tracking the Job in Spark UI I was looking at doing this: Pull code from Git and put into a NiFi attribute, run directly. For bigger projects, you will have many classes and dependencies that may require a full IDE and SBT build cycle. Once I build a Scala jar I want to run against that. Example Code package com.dataflowdeveloper.example
import org.apache.spark.sql.SparkSession
class Example () {
def run( spark: SparkSession) {
try {
println("Started")
val shdf = spark.read.json("hdfs://princeton0.field.hortonworks.com:8020/smartPlugRaw")
shdf.printSchema()
shdf.createOrReplaceTempView("smartplug")
val stuffdf = spark.sql("SELECT * FROM smartplug")
stuffdf.count()
println("Complete.")
} catch {
case e: Exception =>
e.printStackTrace();
}
}
}
=--- Run that with
import com.dataflowdeveloper.example.Example
println("Before run")
val job = new Example()
job.run(spark)
println("After run")
=== after run
{"text\/plain":"After run"} Import Tip You need to put your Jar in Session.jars on the Session control and on the same directory on hdfs. So I did /opt/demo/example.jar in Linux and hdfs:// /opt/demo/example.jar. Make sure Livy and NiFi have read permissions on those. Github: https://github.com/tspannhw/livysparkjob Github Release: https://github.com/tspannhw/livysparkjob/releases/tag/v1.1 Apache NiFi Flow Example spark-it-up-scala.xml
... View more
Labels:
03-14-2018
05:58 PM
1 Kudo
You have to make sure you install the libraries on the server for the correct python version. I always run with executeprocess or executestreamcommand and wrap my python in a shell
... View more
03-14-2018
11:52 AM
Yes, much safer to have another instance you can use for reporting and such. Even if it's just one node.
... View more
03-14-2018
01:29 AM
NiFi 1.1 is ancient. I would not recommend running that old stuff. These are all working: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_release-notes/content/ch_hdf_relnotes.html#repo-location You can download NiFi from: http://public-repo-1.hortonworks.com/HDF/3.1.1.0/nifi-1.5.0.3.1.1.0-35-bin.zip http://public-repo-1.hortonworks.com/HDF/3.1.1.0/nifi-1.5.0.3.1.1.0-35-bin.tar.gz
... View more
03-13-2018
06:28 PM
sure. lots of people do that one. like 4 processors. you can cluster it for lots of files. make sure you have fast disks and a fast network those are your bottle necks.
... View more
03-13-2018
05:37 PM
What version of Apache Hive? What version of HDP?
... View more