Member since
05-08-2023
1
Post
1
Kudos Received
0
Solutions
02-20-2024
11:24 PM
1 Kudo
What is DataFlint
DataFlint is an open-source D-APM (Data-Application Performance Monitoring) for Apache Spark, built for big data engineers. For more information see dataflint.gitbook.io/dataflint-for-spark or the git-repo itself
How to Integrate Dataflint in CDP
DataFlint supports spark 3.2+, so you will need a spark3 parcel
For spark-submit see DataFlint:
spark-submit
--packages io.dataflint:spark_2.12:0.1.4 \
--conf spark.plugins=io.dataflint.spark.SparkDataflintPlugin \
...
To Install on Spark History Server:
download DataFlint jar for scala2.12
Copy it to all servers that have an instance of spark-history Role to spark 3 parcel dir: /opt/cloudera/parcels/SPARK3/lib/spark3/jars Beware /opt/cloudera/parcels/SPARK3 is a symlink to the current SPARK3 active parcel and is controlled by Cloudera Manager.
Go to CM >cluster >Spark 3 >Configuration >spark3-conf/spark-history-server.conf_role_safety_valve / `History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf`
And add: spark.history.provider=org.apache.spark.deploy.history.FsDataflintHistoryProvider
Restart Spark history server
if you are seeing: java.lang.ClassNotFoundException: org.apache.spark.deploy.history.FsDataflintHistoryProvider
Then, you misplace the jar
See docs DataFlint for download and more info
that's it!
... View more
Labels: