Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to parse nested Json in spark2 Dataframe

How to parse nested Json in spark2 Dataframe

New Contributor

Dear Forum Folks,

Need help to parse the Nested JSON in spark Dataframe. Here am pasting the sample JSON file.
Your help would be appreciated. Please give an idea to parse the JSON file.

{
“meta” : {
“view” : {
“id” : “4mse-ku6q”,
“name” : “Traffic Violations”,
“averageRating” : 0,
“category” : “Public Safety”,
“createdAt” : 1403103517,
“description” : “This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or the officer issuing the violation will not be published.\r\n\r\nUpdate Frequency: Daily”,
“displayType” : “table”,
“downloadCount” : 85018,
“hideFromCatalog” : false,
“hideFromDataJson” : false,
“iconUrl” : “fileId:r41tDc239M1FL75LFwXFKzFCWqr8mzMeMTYXiA24USM”,
“indexUpdatedAt” : 1519815131,
“newBackend” : false,
“numberOfComments” : 0,
“oid” : 8890705,
“provenance” : “official”,
“publicationAppendEnabled” : false,
“publicationDate” : 1411040702,
“publicationGroup” : 1620779,
“publicationStage” : “published”,
“rowClass” : “”,
“rowsUpdatedAt” : 1519813971,
“rowsUpdatedBy” : “ajn4-zy65”,
“tableId” : 1722160,
“totalTimesRated” : 0,
“viewCount” : 26908,
“viewLastModified” : 1494270268,
“viewType” : “tabular”,
“columns” : [ {
“id” : -1,
“name” : “sid”,
“dataTypeName” : “meta_data”,
“fieldName” : “:sid”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “id”,
“dataTypeName” : “meta_data”,
“fieldName” : “:id”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “position”,
“dataTypeName” : “meta_data”,
“fieldName” : “:position”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “created_at”,
“dataTypeName” : “meta_data”,
“fieldName” : “:created_at”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “created_meta”,
“dataTypeName” : “meta_data”,
“fieldName” : “:created_meta”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “updated_at”,
“dataTypeName” : “meta_data”,
“fieldName” : “:updated_at”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “updated_meta”,
“dataTypeName” : “meta_data”,
“fieldName” : “:updated_meta”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : -1,
“name” : “meta”,
“dataTypeName” : “meta_data”,
“fieldName” : “:meta”,
“position” : 0,
“renderTypeName” : “meta_data”,
“format” : { },
“flags” : [ “hidden” ]
}, {
“id” : 167291006,
“name” : “Date Of Stop”,
“dataTypeName” : “calendar_date”,
“description” : “Date of the traffic violation.”,
“fieldName” : “date_of_stop”,
“position” : 2,
“renderTypeName” : “calendar_date”,
“tableColumnId” : 20333361,
“width” : 97,
“cachedContents” : {
“largest” : “2018-02-27T00:00:00”,
“non_null” : 1251972,
“null” : 0,
“top” : [ {
“item” : “2018-01-15T00:00:00”,
“count” : 20
}, {
“item” : “2018-01-16T00:00:00”,
“count” : 19
}, {
“item” : “2018-01-18T00:00:00”,
“count” : 18
}, {
“item” : “2018-01-19T00:00:00”,
“count” : 17
}, {
“item” : “2018-01-21T00:00:00”,
“count” : 16
}, {
“item” : “2018-01-22T00:00:00”,
“count” : 15
}, {
“item” : “2018-01-23T00:00:00”,
“count” : 14
}, {
“item” : “2018-01-24T00:00:00”,
“count” : 13
}, {
“item” : “2018-01-25T00:00:00”,
“count” : 12
}, {
“item” : “2018-01-26T00:00:00”,
“count” : 11
}, {
“item” : “2018-01-27T00:00:00”,
“count” : 10
}, {
“item” : “2018-01-29T00:00:00”,
“count” : 9
}, {
“item” : “2018-01-30T00:00:00”,
“count” : 8
}, {
“item” : “2018-01-31T00:00:00”,
“count” : 7
}, {
“item” : “2018-02-01T00:00:00”,
“count” : 6
}, {
“item” : “2018-02-02T00:00:00”,
“count” : 5
}, {
“item” : “2018-02-03T00:00:00”,
“count” : 4
}, {
“item” : “2018-02-05T00:00:00”,
“count” : 3
}, {
“item” : “2018-02-06T00:00:00”,
“count” : 2
}, {
“item” : “2018-02-07T00:00:00”,
“count” : 1
} ],
“smallest” : “2012-01-01T00:00:00”
},
“format” : {
“view” : “date”,
“align” : “left”
}
}, {
“id” : 167291007,
“name” : “Time Of Stop”,
“dataTypeName” : “text”,
“description” : “Time of the traffic violation.”,
“fieldName” : “time_of_stop”,
“position” : 3,
“renderTypeName” : “text”,
“tableColumnId” : 20333363,
“width” : 89,
“cachedContents” : {
“largest” : “23:59:00”,
“non_null” : 1251972,
“null” : 0,
“top” : [ {
“item” : “07:28:00”,
“count” : 20
}, {
“item” : “00:21:00”,
“count” : 19
}, {
“item” : “01:56:00”,
“count” : 18
}, {
“item” : “01:41:00”,
“count” : 17
}, {
“item” : “23:54:00”,
“count” : 16
}, {
“item” : “02:48:00”,
“count” : 15
}, {
“item” : “23:30:00”,
“count” : 14
}, {
“item” : “06:58:00”,
“count” : 13
}, {
“item” : “07:10:00”,
“count” : 12
}, {
“item” : “22:52:00”,
“count” : 11
}, {
“item” : “01:55:00”,
“count” : 10
}, {
“item” : “01:10:00”,
“count” : 9
}, {
“item” : “23:26:00”,
“count” : 8
}, {
“item” : “23:06:00”,
“count” : 7
}, {
“item” : “23:25:00”,
“count” : 6
}, {
“item” : “23:35:00”,
“count” : 5
}, {
“item” : “00:32:00”,
“count” : 4
}, {
“item” : “23:43:00”,
“count” : 3
}, {
“item” : “23:49:00”,
“count” : 2
}, {
“item” : “07:07:00”,
“count” : 1
} ],
“smallest” : “00:00:00”
},
“format” : {
“align” : “left”
}
},
{
“id” : 167291040,
“name” : “Geolocation”,
“dataTypeName” : “location”,
“description” : “Geo-coded location information.”,
“fieldName” : “geolocation”,
“position” : 36,
“renderTypeName” : “location”,
“tableColumnId” : 22014969,
“width” : 100,
“cachedContents” : {
“largest” : {
“latitude” : “39.0835433333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.152665”
},
“non_null” : 1158178,
“null” : 93794,
“top” : [ {
“item” : {
“latitude” : “39.008325”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.049165”
},
“count” : 20
}, {
“item” : {
“latitude” : “39.1400066666667”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.2062733333333”
},
“count” : 19
}, {
“item” : {
“latitude” : “39.1453866666667”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.151475”
},
“count” : 18
}, {
“item” : {
“latitude” : “39.0584283333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.0480016666667”
},
“count” : 17
}, {
“item” : {
“latitude” : “39.064025”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.0948166666667”
},
“count” : 16
}, {
“item” : {
“latitude” : “38.9868028333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.105421”
},
“count” : 15
}, {
“item” : {
“latitude” : “39.041725”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.058525”
},
“count” : 14
}, {
“item” : {
“latitude” : “39.1450033333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.2026783333333”
},
“count” : 13
}, {
“item” : {
“latitude” : “39.15924”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.2197733333333”
},
“count” : 12
}, {
“item” : {
“latitude” : “39.1497083333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.2087416666667”
},
“count” : 11
}, {
“item” : {
“latitude” : “39.066465”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.062715”
},
“count” : 10
}, {
“item” : {
“latitude” : “39.00684”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.04998”
},
“count” : 9
}, {
“item” : {
“latitude” : “39.0072766666667”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.0492216666667”
},
“count” : 8
}, {
“item” : {
“latitude” : “39.148245”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.2371266666667”
},
“count” : 7
}, {
“item” : {
“latitude” : “39.12278”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.1640333333333”
},
“count” : 6
}, {
“item” : {
“latitude” : “39.004515”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-76.980765”
},
“count” : 5
}, {
“item” : {
“latitude” : “39.0051116666667”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-76.9967766666667”
},
“count” : 4
}, {
“item” : {
“latitude” : “39.0934533333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.13202”
},
“count” : 3
}, {
“item” : {
“latitude” : “39.09237”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.131295”
},
“count” : 2
}, {
“item” : {
“latitude” : “39.1007”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.1412233333333”
},
“count” : 1
} ],
“smallest” : {
“latitude” : “39.0835433333333”,
“human_address” : “{“address”:”",“city”:"",“state”:"",“zip”:""}",
“longitude” : “-77.152665”
}
},
“format” : {
“view” : “address_coords”,
“align” : “left”
},
“subColumnTypes” : [ “human_address”, “latitude”, “longitude”, “machine_address”, “needs_recoding” ]
} ],
“disabledFeatureFlags” : [ “allow_comments” ],
“grants” : [ {
“inherited” : false,
“type” : “viewer”,
“flags” : [ “public” ]
} ],
“metadata” : {
“jsonQuery” : {
“order” : [ {
“ascending” : false,
“columnFieldName” : “date_of_stop”
} ]
},
“rdfSubject” : “0”,
“rdfClass” : “”,
“custom_fields” : {
“Dataset Information” : {
“Departments” : “Police, Department of”,
“Update Frequency” : “Daily”
}
},
“rowIdentifier” : 167291005,
“availableDisplayTypes” : [ “table”, “fatrow”, “page” ],
“rowLabel” : “”,
“renderTypeConfig” : {
“visible” : {
“table” : true
}
}
},
“owner” : {
“id” : “ajn4-zy65”,
“displayName” : “MCG ESB Service”,
“screenName” : “MCG ESB Service”,
“type” : “interactive”,
“flags” : [ “organizationMember” ]
},
“query” : {
“orderBys” : [ {
“ascending” : false,
“expression” : {
“columnId” : 167291006,
“type” : “column”
}
} ]
},
“rights” : [ “read” ],
“tableAuthor” : {
“id” : “ajn4-zy65”,
“displayName” : “MCG ESB Service”,
“screenName” : “MCG ESB Service”,
“type” : “interactive”,
“flags” : [ “organizationMember” ]
},
“tags” : [ “traffic”, “stop”, “violations”, “electronic issued.” ],
“flags” : [ “default”, “restorable”, “restorePossibleForType” ]
}
},
“data” : [ [ 2118167, “EE8BC302-660F-48C4-B422-17427ECE821F”, 2118167, 1482239054, “498050”, 1482239054, “498050”, null, “2013-09-24T00:00:00”, “17:11:00”, “MCP”, “3rd district, Silver Spring”, “DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION”, “8804 FLOWER AVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2008”, “FORD”, “4S”, “BLACK”, “Citation”, “13-401(h)”, “Transportation Article”, “No”, “BLACK”, “M”, “TAKOMA PARK”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 3064529, “9763533B-48A3-480A-BB54-1EF8C60849AF”, 3064529, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-29T00:00:00”, “10:19:00”, “MCP”, “2nd district, Bethesda”, “DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC CONTROL DEVICE INSTRUCTIONS”, “WISCONSIN AVE@ ELM ST”, “38.981725”, “-77.0927566666667”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “VA”, “02 - Automobile”, “2001”, “TOYOTA”, “COROLLA”, “GREEN”, “Citation”, “21-201(a1)”, “Transportation Article”, “No”, “WHITE”, “F”, “FAIRFAX STATION”, “VA”, “VA”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “38.981725”, “-77.0927566666667”, null, false ] ]
, [ 2118171, “938F1DC7-DE76-45E8-AF25-361CBCC8507C”, 2118171, 1482239054, “498050”, 1482239054, “498050”, null, “2014-12-01T00:00:00”, “12:52:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE STOP AND YIELD AT THRU HWY”, “CHRISTOPHER AVE/MONTGOMERY VILLAGE AVE”, “39.1628883333333”, “-77.2290883333333”, “No”, “No”, “No”, “Yes”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2001”, “HONDA”, “ACCORD”, “SILVER”, “Citation”, “21-403(b)”, “Transportation Article”, “No”, “BLACK”, “F”, “UPPER MARLBORO”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.1628883333333”, “-77.2290883333333”, null, false ] ]
, [ 3064530, “75858866-32C1-4E6B-8C01-322CD15F33EB”, 3064530, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-29T00:00:00”, “09:22:00”, “MCP”, “3rd district, Silver Spring”, “FAILURE YIELD RIGHT OF WAY ON U TURN”, “CHERRY HILL RD./CALVERTON BLVD.”, “39.056975”, “-76.9546333333333”, “No”, “No”, “No”, “Yes”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “1998”, “DODG”, “DAKOTA”, “WHITE”, “Citation”, “21-402(b)”, “Transportation Article”, “No”, “BLACK”, “M”, “FORT WASHINGTON”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.056975”, “-76.9546333333333”, null, false ] ]
, [ 3064531, “077389FD-023F-420B-909C-85034959FEF5”, 3064531, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE OF DR. TO MAKE LANE CHANGE TO AVAIL. LANE NOT IMMED. ADJ. TO STOPPED EMERG. VEH,”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “21-405(e1)”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 2118177, “730662D9-7CC1-4815-BA0B-C28CD61BEC56”, 2118177, 1482239054, “498050”, 1482239054, “498050”, null, “2013-08-27T00:00:00”, “00:55:00”, “MCP”, “2nd district, Bethesda”, “NEGLIGENT DRIVING VEHICLE IN CARELESS AND IMPRUDENT MANNER ENDANGERING PROPERTY, LIFE AND PERSON”, “CONNECTICUT/CHEVY CHASE LAKE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2013”, “HYUNDAI”, “ELANTRA”, “GRAY”, “Citation”, “21-901.1(b)”, “Transportation Article”, “No”, “WHITE”, “F”, “SILVER SPRING”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 2118178, “82062BF4-BFCB-4A0D-8416-22D262569EBE”, 2118178, 1482239054, “498050”, 1482239054, “498050”, null, “2013-10-08T00:00:00”, “13:23:00”, “MCP”, “4th district, Wheaton”, “DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION”, “GEORGIA AVE / BEL PRE RD”, “39.0933833333333”, “-77.0795516666667”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “1993”, “FORD”, “PICKUP”, “BLACK”, “Citation”, “13-401(h)”, “Transportation Article”, “No”, “HISPANIC”, “M”, “BELTSVILLE”, “MD”, “MD”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.0933833333333”, “-77.0795516666667”, null, false ] ]
, [ 2118179, “E3BB6016-5938-41B4-9C3B-7A1DEE7FFBB6”, 2118179, 1482239054, “498050”, 1482239054, “498050”, null, “2015-04-24T00:00:00”, “00:38:00”, “MCP”, “1st district, Rockville”, “DRIVER FAIL TO STOP AT FLASHING RED TRAFFIC SIGNAL STOP LINE”, “EB MONTROSE PKWY/EAST JEFFERSON ST”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “VA”, “02 - Automobile”, “2003”, “DODGE”, “SPRINTER”, “WHITE”, “Citation”, “21-204(b)”, “Transportation Article”, “No”, “HISPANIC”, “M”, “SILVER SPRING”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 3064532, “0D05BB47-567E-4521-B85C-1BD77C0ADDED”, 3064532, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “FAILURE OF INDIVIDUAL DRIVING ON HIGHWAY TO DISPLAY LICENSE TO UNIFORMED POLICE ON DEMAND”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “16-112©”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 3064533, “39689BC8-0E7E-4525-AC40-C469AE08F4B0”, 3064533, 1504085194, “498050”, 1504085194, “498050”, null, “2017-08-28T00:00:00”, “23:41:00”, “MCP”, “6th district, Gaithersburg / Montgomery Village”, “DRIVING VEHICLE ON HIGHWAY WITH AN EXPIRED LICENSE”, “355 @ SOUTH WESTLAND DRIVE”, null, null, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2015”, “MINI COOPER”, “2S”, “WHITE”, “Citation”, “16-115(g)”, “Transportation Article”, “No”, “WHITE”, “M”, “GAITHERSBURG”, “MD”, “MD”, “A - Marked Patrol”, [ null, null, null, null, null ] ]
, [ 2118181, “7980162D-D02E-4288-943B-52150C154DF1”, 2118181, 1482239054, “498050”, 1502443108, “498050”, null, “2014-02-14T00:00:00”, “20:10:00”, “MCP”, “1st district, Rockville”, “FAILURE TO DRIVE ON RIGHT HAND ROADWAY OF DIVIDED HWY”, “GATEWAY CENTER DR @ CLARKSBURG RD”, “39.2348434333333”, “-77.28153995”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “No”, “MD”, “02 - Automobile”, “2005”, “CADI”, “STS”, “BLACK”, “Citation”, “21-311(1)”, “Transportation Article”, “No”, “WHITE”, “M”, “POINT OF ROCK”, “MD”, “WV”, “A - Marked Patrol”, [ “{“address”:”",“city”:"",“state”:"",“zip”:""}", “39.2348434333333”, “-77.28153995”, null, false ] ]
}

1 REPLY 1

Re: How to parse nested Json in spark2 Dataframe

You simply need to read the file using the json method in sqlContext. I am going to take a quick example using a sample file compared to the behemoth of yours.

Let's say our file looks like this

{"col1":{"col2":"val2","col3":["arr1","arr2"]}}

Assuming you are using scala for your operations and using shell for this example, when you fire your spark-shell, you will get an instance of SparkSession called spark. You can use it to access the methods that will help you solve your problem.

val myDF = spark.read.json("/myFilePath")

The above statement will create a DataFrame for you. You can see the schema using the following statement.

myDF.printSchema

root
 |-- col1: struct (nullable = true)
 |    |-- col2: string (nullable = true)
 |    |-- col3: array (nullable = true)
 |    |    |-- element: string (containsNull = true)

You can also see the content of the DataFrame using show method

myDF.show

Now, I have taken a nested column and an array in my file to cover the two most common "complex datatypes" that you will get in your JSON documents. You can access them specifically as shown below.

//Accessing the nested doc

myDF.select("col1.col2").show

Another way to process the data is using SQL. Follows a quick example.

//Accessing the array elements

myDF.registerTempTable("dummyTable")

spark.sql("select col1.col3[0] from dummyTable")

Hope that helps!

Don't have an account?
Coming from Hortonworks? Activate your account here