Created 10-27-2016 01:06 PM
Hi.
How can i create the schema with 2 levels in a JSON in spark??
>>> df1.schema StructType(List(StructField(CAMPO1,StringType,true),StructField(CAMPO2,StringType,true),StructType(List(StructField(VARIABLE,StringType,true),StructField(V1,StringType,true))),true))
This code doesnt work:
schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE.V1", StringType(), True)
])The json i have is:
{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}please could you help me?
Many thanks
Created 10-27-2016 02:29 PM
You're schema structure is close, but you need to make a few modifications, like this:
import org.apache.spark.sql.types._ 
val data = sc.parallelize("""{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}""" :: Nil)
val schema = (new StructType)
    .add("CAMPO1", StringType)
    .add("CAMPO2", StringType)
    .add("VARIABLE", (new StructType)
        .add("V1", StringType))
sqlContext.read.schema(schema).json(data).select("VARIABLE.V1").show()
Please let me know if this works for you. Thanks!
Created 10-27-2016 05:37 PM
I am on Python enviornment, I have translate the scala code to Python code like that, but doesnt WORK, please any suggestion?
schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE", StructType([
StructField("V1", StringType(), True),
StructField("V2", DoubleType(), True),
StructField("V3", StringType(), True)]))
])
df1 = sqlContext.read.json("xxxx.json",schema).select('VARIABLE.V2').show()
+-----------------+
|V2               |
+-----------------+
|             null|
+-----------------
					
				
			
			
				
			
			
			
			
			
			
			
		Created 10-27-2016 07:08 PM
Hi:
I have resolved the problem, but I thing there is A bug or somenthing, let my explain:
The V1=11.88 whe y type DoubleType or DecimalType doesnt work, but if I type StringType, is working, so... please could you confirm that is correct my test????
{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"11.88"}}
schema = StructType([
StructField("CAMPO1", StringType(), True),
StructField("CAMPO2", StringType(), True),
StructField("VARIABLE", StructType([
StructField("V1", StringType(), True),
StructField("V2", StringType(), True),
StructField("V3", StringType(), True)]))
])
thanks
 
					
				
				
			
		
