Member since
12-08-2025
1
Post
0
Kudos Received
0
Solutions
02-08-2026
12:10 AM
@MarlinGomez For that CCA175 streaming scenario with inconsistent formats, cleansing/transforming to HDFS, better to go with Spark Structured Streaming + schema evolution as the most exam-realistic pick. It handles real-time ingestion efficiently via micro-batches, infers/evolves schemas on the fly (especially with JSON/Avro), and lets you apply transformations like filter/map before writing Parquet to HDFS. Separate ETL pipelines per format add too much complexity/overhead for exam constraints, and pure schema-on-read skips proactive cleansing. QuickStart with Kafka source, schema merging enabled: .option("mergeSchema", "true").writeStream... to HDFS.This nails the "perform ETL on data using Spark API" objective perfectly. Good luck on your prep.
... View more