Support Questions

Find answers, ask questions, and share your expertise

StructType schema spark on JSON

avatar
Master Collaborator

Hi.

How can i create the schema with 2 levels in a JSON in spark??

>>> df1.schema
StructType(List(StructField(CAMPO1,StringType,true),StructField(CAMPO2,StringType,true),StructType(List(StructField(VARIABLE,StringType,true),StructField(V1,StringType,true))),true))

This code doesnt work:

schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE.V1", StringType(), True)
])

The json i have is:

{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}

please could you help me?

Many thanks

3 REPLIES 3

avatar

@Roberto Sancho

You're schema structure is close, but you need to make a few modifications, like this:

import org.apache.spark.sql.types._ 

val data = sc.parallelize("""{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}""" :: Nil)

val schema = (new StructType)
    .add("CAMPO1", StringType)
    .add("CAMPO2", StringType)
    .add("VARIABLE", (new StructType)
        .add("V1", StringType))

sqlContext.read.schema(schema).json(data).select("VARIABLE.V1").show()

Please let me know if this works for you. Thanks!

avatar
Master Collaborator

I am on Python enviornment, I have translate the scala code to Python code like that, but doesnt WORK, please any suggestion?

schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE", StructType([
StructField("V1", StringType(), True),
StructField("V2", DoubleType(), True),
StructField("V3", StringType(), True)]))
])

df1 = sqlContext.read.json("xxxx.json",schema).select('VARIABLE.V2').show()
+-----------------+
|V2               |
+-----------------+
|             null|
+-----------------

avatar
Master Collaborator

Hi:

I have resolved the problem, but I thing there is A bug or somenthing, let my explain:

The V1=11.88 whe y type DoubleType or DecimalType doesnt work, but if I type StringType, is working, so... please could you confirm that is correct my test????

{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"11.88"}}

schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE", StructType([
StructField("V1", StringType(), True),
StructField("V2", StringType(), True),
StructField("V3", StringType(), True)]))
])

thanks