Hi, I am executing a nifi processor ExecuteSparkInteractive and I want to read an avro file.
NiFi =Version 126.96.36.199.1.2.0-7 single node standalone install
Spark = Version 2.3.0
Livy is whatever comes on HDP 2.6.5 (not showing up on version list)
Flow is [do light work]>[write to hdfs]>[call spark & do heavy work]
Used this to get started https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte... (thanks @Timothy Spann awesome post).
The code works fine when file written is in json and reading a json file using the json reader.
When I try to write the same data in avro format, the spark can't find the libraries.
org.apache.spark.sql.AnalysisException: Failed to find data source: avro.
I grabbed the databricks jar: com.databricks:spark-avro_2.11:4.0.0 (the same version I use in the --packages with spark-submit when testing the spark code)
and put it into a directory where nifi can get it and set the property on my LivySessionController
Session JARs > /opt/spark/jars/spark-avro_2.11-4.0.0.jar
this code works:
val records:Dataset[Row] = spark.read.json("/user/somedeveloper/stage/data.json")
this code doesn't
val records:Dataset[Row] = spark.read.format("avro").load("/user/somedeveloper/stage/data.avro")
Running out of ideas, and json is too big and slow when I am running at full scale.