Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to include spark packages from nifi when executing through livy

Highlighted

How to include spark packages from nifi when executing through livy

New Contributor

Hi, I am executing a nifi processor ExecuteSparkInteractive and I want to read an avro file.

NiFi =Version 1.5.0.3.1.2.0-7 single node standalone install

Spark = Version 2.3.0

Livy is whatever comes on HDP 2.6.5 (not showing up on version list)

Flow is [do light work]>[write to hdfs]>[call spark & do heavy work]

Used this to get started https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte... (thanks @Timothy Spann awesome post).

The code works fine when file written is in json and reading a json file using the json reader.

When I try to write the same data in avro format, the spark can't find the libraries.

org.apache.spark.sql.AnalysisException: Failed to find data source: avro.

I grabbed the databricks jar: com.databricks:spark-avro_2.11:4.0.0 (the same version I use in the --packages with spark-submit when testing the spark code)

and put it into a directory where nifi can get it and set the property on my LivySessionController

Session JARs > /opt/spark/jars/spark-avro_2.11-4.0.0.jar

this code works:

val records:Dataset[Row] = spark.read.json("/user/somedeveloper/stage/data.json")

this code doesn't

val records:Dataset[Row] = spark.read.format("avro").load("/user/somedeveloper/stage/data.avro")

Running out of ideas, and json is too big and slow when I am running at full scale.

1 REPLY 1

Re: How to include spark packages from nifi when executing through livy

New Contributor

Hi @Eric Richardson, you can help me to configure correctly the ExecuteSparkInteractive and LivySessionController?

I use the same post but does not work.