Help me how to process an XML file using Spark without using databricks spark-xml package. Is there any standard way which we can use in real time in live projects?
AFAIK Yes, by using databricks spark-xml package, we can parse the xml file and create Dataframe on top of Xml data.
Once we create dataframe then by using DataframeAPI functions we can analyze the data.
I want to parse them using pyspark withput usind databricks package. Is there a way to do it? If yes, please give me a sample code.
Spark is great for XML processing. It is based on a massively parallel distributed compute paradigm. I think you cam find some useful info in this examples: