Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Cloudera Employee
Created on
02-20-2024
11:24 PM
- edited on
03-24-2025
10:33 PM
by
VidyaSargur
What is DataFlint
DataFlint is an open-source D-APM (Data-Application Performance Monitoring) for Apache Spark, built for big data engineers. For more information see dataflint.gitbook.io/dataflint-for-spark or the git-repo itself
How to Integrate DataFlint in CDP
DataFlint supports spark 3.2+, so you will need a spark3 parcel
For spark-submit see DataFlint:
spark-submit
--packages io.dataflint:spark_2.12:0.3.1 \
--conf spark.plugins=io.dataflint.spark.SparkDataflintPlugin \
...
To Install on Spark History Server:
- download DataFlint jar for scala2.12
- Copy it to all servers that have an instance of spark-history Role to spark 3 parcel dir:
Beware /opt/cloudera/parcels/SPARK3 is a symlink to the current SPARK3 active parcel and is controlled by Cloudera Manager./opt/cloudera/parcels/SPARK3/lib/spark3/jars
- Go to CM >cluster >Spark 3 >Configuration >spark3-conf/spark-history-server.conf_role_safety_valve
/ `History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf` - And add:
spark.history.provider=org.apache.spark.deploy.history.FsDataflintHistoryProvider
- Restart Spark history server
- if you are seeing:
java.lang.ClassNotFoundException: org.apache.spark.deploy.history.FsDataflintHistoryProvider
- Then, you misplace the jar
See docs DataFlint for download and more info
that's it!
1,023 Views