Support Questions
Find answers, ask questions, and share your expertise

How to include spark packages from nifi when executing through livy

New Contributor

Hi, I am executing a nifi processor ExecuteSparkInteractive and I want to read an avro file.

NiFi =Version 1.5.0.3.1.2.0-7 single node standalone install

Spark = Version 2.3.0

Livy is whatever comes on HDP 2.6.5 (not showing up on version list)

Flow is [do light work]>[write to hdfs]>[call spark & do heavy work]

Used this to get started https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte... (thanks @Timothy Spann awesome post).

The code works fine when file written is in json and reading a json file using the json reader.

When I try to write the same data in avro format, the spark can't find the libraries.

org.apache.spark.sql.AnalysisException: Failed to find data source: avro.

I grabbed the databricks jar: com.databricks:spark-avro_2.11:4.0.0 (the same version I use in the --packages with spark-submit when testing the spark code)

and put it into a directory where nifi can get it and set the property on my LivySessionController

Session JARs > /opt/spark/jars/spark-avro_2.11-4.0.0.jar

this code works:

val records:Dataset[Row] = spark.read.json("/user/somedeveloper/stage/data.json")

this code doesn't

val records:Dataset[Row] = spark.read.format("avro").load("/user/somedeveloper/stage/data.avro")

Running out of ideas, and json is too big and slow when I am running at full scale.

1 REPLY 1

New Contributor

Hi @Eric Richardson, you can help me to configure correctly the ExecuteSparkInteractive and LivySessionController?

I use the same post but does not work.