Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apply a logic for a particular column in dataframe in spark

Highlighted

Apply a logic for a particular column in dataframe in spark

New Contributor

I have a Dataframe and it has been imported from mysql

dataframe_mysql.show()
+----+---------+---------------------------------------------------+
|  id|accountid|                                            xmldata|
+----+---------+---------------------------------------------------+
|1001|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1002|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1003|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1004|    12347|<AccountSetup xmlns:xsi="test"><Customers test="...|
+----+---------+---------------------------------------------------+

In the xmldata column there is xml tags inside, I need to parse it in a structured data in a seperate dataframe.

Previously I had the xml file alone in a text file, and loaded in a spark dataframe using "com.databricks.spark.xml"

spark-shell --packages com.databricks:spark-xml_2.10:0.4.1,com.databricks:spark-csv_2.10:1.5.0 
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.xml") .option("rowTag","Account").load("../Account.xml")

the final output I got as structured one

df.show()
+----------+--------------------+--------------------+--------------+--------------------+-------+.... | AcctNbr| AddlParties| Addresses|ApplicationInd| Beneficiaries|ClassCd|.... +----------+--------------------+--------------------+--------------+--------------------+-------+.... |AAAAAAAAAA|[[Securzxcdd cxcs...|[WrappedArray([D,...| T|[WrappedArray([11...| 35|.... +----------+--------------------+--------------------+--------------+--------------------+-------+....

Please advice how to achieve the this when I have the xml content inside a dataframe. I am new to spark and scala

Don't have an account?
Coming from Hortonworks? Activate your account here