Created 10-27-2016 01:06 PM
Hi.
How can i create the schema with 2 levels in a JSON in spark??
>>> df1.schema StructType(List(StructField(CAMPO1,StringType,true),StructField(CAMPO2,StringType,true),StructType(List(StructField(VARIABLE,StringType,true),StructField(V1,StringType,true))),true))
This code doesnt work:
schema = StructType([ StructField("CAMPO1", StringType(), True), StructField("CAMPO2", StringType(), True), StructField("VARIABLE.V1", StringType(), True) ])
The json i have is:
{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}
please could you help me?
Many thanks
Created 10-27-2016 02:29 PM
You're schema structure is close, but you need to make a few modifications, like this:
import org.apache.spark.sql.types._ val data = sc.parallelize("""{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}""" :: Nil) val schema = (new StructType) .add("CAMPO1", StringType) .add("CAMPO2", StringType) .add("VARIABLE", (new StructType) .add("V1", StringType)) sqlContext.read.schema(schema).json(data).select("VARIABLE.V1").show()
Please let me know if this works for you. Thanks!
Created 10-27-2016 05:37 PM
I am on Python enviornment, I have translate the scala code to Python code like that, but doesnt WORK, please any suggestion?
schema = StructType([ StructField("CAMPO1", StringType(), True), StructField("CAMPO2", StringType(), True), StructField("VARIABLE", StructType([ StructField("V1", StringType(), True), StructField("V2", DoubleType(), True), StructField("V3", StringType(), True)])) ]) df1 = sqlContext.read.json("xxxx.json",schema).select('VARIABLE.V2').show() +-----------------+ |V2 | +-----------------+ | null| +-----------------
Created 10-27-2016 07:08 PM
Hi:
I have resolved the problem, but I thing there is A bug or somenthing, let my explain:
The V1=11.88 whe y type DoubleType or DecimalType doesnt work, but if I type StringType, is working, so... please could you confirm that is correct my test????
{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"11.88"}} schema = StructType([ StructField("CAMPO1", StringType(), True), StructField("CAMPO2", StringType(), True), StructField("VARIABLE", StructType([ StructField("V1", StringType(), True), StructField("V2", StringType(), True), StructField("V3", StringType(), True)])) ])
thanks