About MarlinGomez

RAGHUY · ‎02-08-2026

@MarlinGomez For that CCA175 streaming scenario with inconsistent formats, cleansing/transforming to HDFS, better to go with Spark Structured Streaming + schema evolution as the most exam-realistic pick. It handles real-time ingestion efficiently via micro-batches, infers/evolves schemas on the fly (especially with JSON/Avro), and lets you apply transformations like filter/map before writing Parquet to HDFS. Separate ETL pipelines per format add too much complexity/overhead for exam constraints, and pure schema-on-read skips proactive cleansing. QuickStart with Kafka source, schema merging enabled: .option("mergeSchema", "true").writeStream... to HDFS.This nails the "perform ETL on data using Spark API" objective perfectly. Good luck on your prep.

Online	Offline
Last Visited	‎12-08-2025 01:49 PM

Member Since	‎12-08-2025 01:49 PM
Last Visited	‎12-08-2025 01:49 PM
Posts	1

Cloudera Community

Re: Need Help Clarifying a Real CCA175 Scenario