Hi, I am executing a nifi processor ExecuteSparkInteractive and I want to read an avro file. NiFi =Version 126.96.36.199.1.2.0-7 single node standalone install Spark = Version 2.3.0 Livy is whatever comes on HDP 2.6.5 (not showing up on version list) Flow is [do light work]>[write to hdfs]>[call spark & do heavy work] Used this to get started https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte.html (thanks @Timothy Spann awesome post). The code works fine when file written is in json and reading a json file using the json reader. When I try to write the same data in avro format, the spark can't find the libraries. org.apache.spark.sql.AnalysisException: Failed to find data source: avro. I grabbed the databricks jar: com.databricks:spark-avro_2.11:4.0.0 (the same version I use in the --packages with spark-submit when testing the spark code) and put it into a directory where nifi can get it and set the property on my LivySessionController Session JARs > /opt/spark/jars/spark-avro_2.11-4.0.0.jar this code works: val records:Dataset[Row] = spark.read.json("/user/somedeveloper/stage/data.json") this code doesn't val records:Dataset[Row] = spark.read.format("avro").load("/user/somedeveloper/stage/data.avro") Running out of ideas, and json is too big and slow when I am running at full scale.
... View more
I want to convert some field names, not values or data types. I already have schemas registered, how do I use the schema registry so I don't have to paste the literal schemas into the input and output schema properties?
... View more