Created 01-19-2019 10:01 PM
Help me how to process an XML file using Spark without using databricks spark-xml package. Is there any standard way which we can use in real time in live projects?
Created 01-20-2019 12:10 AM
@Shu Can you help me with this? I need to run a poc in my system.
Created 01-21-2019 02:31 PM
AFAIK Yes, by using databricks spark-xml package, we can parse the xml file and create Dataframe on top of Xml data.
Once we create dataframe then by using DataframeAPI functions we can analyze the data.
Refer to this and this link for more details regards to usage/source code of Spark XML package.
Created 02-12-2019 05:37 PM
I want to parse them using pyspark withput usind databricks package. Is there a way to do it? If yes, please give me a sample code.
Thank you.
Created 02-13-2019 07:11 PM
Spark is great for XML processing. It is based on a massively parallel distributed compute paradigm. I think you cam find some useful info in this examples:
https://stackoverflow.com/questions/33078221/xml-processing-in-spark
https://community.hortonworks.com/questions/71538/parsing-xml-in-spark-rdd.html
Also, check on https://anonymous-essay.com/ XSD/XML complexity. And finally you can view this thread to find out how do it without databricks package.
Created 02-23-2019 11:01 AM
If you like to use NIFI instead you can try this groovy script
Created 09-25-2020 11:45 PM
Hola,
Para procesamiento de XML sobre Apache Spark puede utilizar la librería spark-xml.
Para Apache Spark 3.0 utiliza la versión spark-xml_2.12-0.10.0.jar
Para Apache Spark 2.4 utiliza la versión spark-xml_2.11-0.6.0.jar
Saludos.
Created 09-26-2020 02:37 AM