Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

custom schema in spark-csv throwing error

Highlighted

custom schema in spark-csv throwing error

Rising Star

Hi,

I trying to process CSV file using spark -csv package in spark-shell

scala> import org.apache.spark.sql.hive.HiveContext                                                                                                  
import org.apache.spark.sql.hive.HiveContext                                                                                                         
                                                                                                                                                     
scala> import org.apache.spark.sql.hive.orc._                                                                                                        
import org.apache.spark.sql.hive.orc._                                                                                                               
                                                                                                                                                     
scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};                                                         
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}                                                                 
                                                                                                                                                     
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)                                                                               
15/12/21 02:06:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be r
emoved in the future. Please use the new key 'spark.yarn.am.waitTime' instead.                                                                       
15/12/21 02:06:24 INFO HiveContext: Initializing execution hive, version 0.13.1                                                                      
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@74cba4b                                                   
                                                                                                                                                     

scala> val customSchema = StructType(Seq(StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType
, true),StructField("comment", StringType, true),StructField("blank", StringType, true)))
customSchema: org.apache.spark.sql.types.StructType = StructType(StructField(year,IntegerType,true), StructField(make,StringType,true), StructField(m
odel,StringType,true), StructField(comment,StringType,true), StructField(blank,StringType,true))                                                     
                                                                                                                                                     
scala> val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType, true).add("comment",
 StringType, true).add("blank", StringType, true)
:24: error: not enough arguments for constructor StructType: (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.typ
es.StructType.                                                                                                                                       
Unspecified value parameter fields.                                                                                                                  
       val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType, true).add("comment",
 StringType, true).add("blank", StringType, true)               
4 REPLIES 4

Re: custom schema in spark-csv throwing error

Re: custom schema in spark-csv throwing error

Rising Star

@Neeraj Sabharwal

I am using HDP 2.3.2 sandbox.

Do I need set up some configuration .Why it is not working for me.

Thanks

Re: custom schema in spark-csv throwing error

@Divya Gehlot I am using spark 1.5.2. Testing in SB now

Re: custom schema in spark-csv throwing error

Mentor

@Divya Gehlot did that solve your problem, can you post your solution otherwise please accept the answer.

Don't have an account?
Coming from Hortonworks? Activate your account here