<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193874#M155934</link>
    <description>&lt;P&gt;input data &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/64927-input.txt"&gt;input.txt&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 29 Mar 2018 11:44:05 GMT</pubDate>
    <dc:creator>Former Member</dc:creator>
    <dc:date>2018-03-29T11:44:05Z</dc:date>
    <item>
      <title>how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193863#M155923</link>
      <description>&lt;P&gt;&lt;STRONG&gt;i have csv file example with schema &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;test.csv&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;name,age,state&lt;/P&gt;&lt;P&gt;swathi,23,us&lt;/P&gt;&lt;P&gt;srivani,24,UK&lt;/P&gt;&lt;P&gt;ram,25,London&lt;/P&gt;&lt;P&gt;sravan,30,UK&lt;/P&gt;&lt;P&gt;we need to split into different files according to state US state data should be loaded into (with schema)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;output&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/user/data/US.txt&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt; &lt;/STRONG&gt;name,age,state&lt;/P&gt;&lt;P&gt;swathi,23,us&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/user/data/UK&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;name,age,state&lt;/P&gt;&lt;P&gt;srivani,24,UK&lt;/P&gt;&lt;P&gt;sravan,30,UK&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/user/data/London&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;name,age,state&lt;/P&gt;&lt;P&gt;ram,25,London&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 15:11:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193863#M155923</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-27T15:11:04Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193864#M155924</link>
      <description>&lt;P&gt;please help out using spark scala  how to solve this problem this task assigned to me&lt;/P&gt;&lt;P&gt;thanks in advance&lt;/P&gt;&lt;P&gt;swathi.T&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 15:14:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193864#M155924</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-27T15:14:05Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193865#M155925</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12747/swathidataengineer.html" nodeid="12747"&gt;@swathi thukkaraju&lt;/A&gt;&lt;P&gt;By using Csv package we can do this use case easily &lt;/P&gt;&lt;P&gt;here is what i tried&lt;/P&gt;&lt;P&gt;i had a csv file in hdfs directory called test.csv&lt;/P&gt;&lt;PRE&gt;name,age,state
swathi,23,us
srivani,24,UK
ram,25,London
sravan,30,UK&lt;/PRE&gt;&lt;P&gt;initialize spark shell with csv package&lt;/P&gt;&lt;PRE&gt;spark-shell --master local --packages com.databricks:spark-csv_2.10:1.3.0&lt;/PRE&gt;&lt;P&gt;loading the &lt;STRONG&gt;hdfs file&lt;/STRONG&gt; into spark dataframe using csv format as we are having header so i have included header while loading&lt;/P&gt;&lt;PRE&gt;val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/user/test/test.csv")&lt;/PRE&gt;&lt;P&gt;if your &lt;STRONG&gt;file is in local&lt;/STRONG&gt; the use &lt;/P&gt;&lt;PRE&gt;val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("file:///&amp;lt;local-path&amp;gt;/test.csv")&lt;/PRE&gt;&lt;P&gt;once the loading completes to view schema &lt;/P&gt;&lt;PRE&gt;scala&amp;gt; df.printSchema()
root
 |-- name: string (nullable = true)
 |-- age: string (nullable = true)
 |-- state: string (nullable = true)&lt;/PRE&gt;&lt;P&gt;Now we are having df dataframe with schema then we can apply all the filter operations on the schema&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Filtering and storing state is us,UK,London:-&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;val df2=df.filter($"state"==="us") &lt;/PRE&gt;&lt;P&gt;(or)&lt;/P&gt;&lt;PRE&gt;val df2=df.filter(col("state")==="us")&lt;/PRE&gt;&lt;PRE&gt;scala&amp;gt; df2.show()
+------+---+-----+
|  name|age|state|
+------+---+-----+
|swathi| 23|   us|
+------+---+-----+&lt;/PRE&gt;&lt;P&gt;as we can see above we are having only state is us in df2 dataframe.&lt;/P&gt;&lt;P&gt;In the same way we need to filter and create new dataframes for state is UK and London &lt;/P&gt;&lt;PRE&gt;val df3=df.filter(col("state")==="UK")&lt;/PRE&gt;&lt;PRE&gt;val df4=df.filter(col("state")==="London")&lt;/PRE&gt;&lt;P&gt;once the filtering and creating new data frames is done now we need to write df2,df3,df4 dataframes into hdfs with headers included.&lt;/P&gt;&lt;P&gt;As we cannot create specific files while writing the data back to hdfs, with below command i'm creating us directory in hdfs then loading the df2 data frame data into us directory&lt;/P&gt;&lt;PRE&gt;df2.write.format("com.databricks.spark.csv").save("/user/test/us")&lt;/PRE&gt;&lt;P&gt;same way &lt;/P&gt;&lt;P&gt;we need to store df3,df4 into different directories in hdfs&lt;/P&gt;&lt;PRE&gt;df3.write.format("com.databricks.spark.csv").save("/user/test/UK")&lt;/PRE&gt;&lt;PRE&gt;df4.write.format("com.databricks.spark.csv").save("/user/test/London")&lt;/PRE&gt;&lt;P&gt;now when you run &lt;/P&gt;&lt;PRE&gt;hadoop fs -ls /user/test/&lt;/PRE&gt;&lt;P&gt;you are going to have 3 directories(us,UK,London) and the corresponding part-00000 files in those directories.&lt;/P&gt;&lt;P&gt;In addition &lt;/P&gt;&lt;P&gt;we can &lt;STRONG&gt;create register temp table&lt;/STRONG&gt; once the data loaded into &lt;STRONG&gt;df dataframe&lt;/STRONG&gt;, then we can run sql queries on top of temp table using sqlContext.&lt;/P&gt;&lt;PRE&gt;val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/user/test/test.csv")
df.registerTempTable("temp")
val df2=sqlContext.sql("select * from temp where state ='us')
val df3=sqlContext.sql("select * from temp where state ='UK')
val df4=sqlContext.sql("select * from temp where state ='London')
df2.write.format("com.databricks.spark.csv").save("/user/test/us")
df3.write.format("com.databricks.spark.csv").save("/user/test/UK")
df4.write.format("com.databricks.spark.csv").save("/user/test/London")
&lt;/PRE&gt;&lt;P&gt;in both ways(using filter and using register temp table) results will be the same.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 18:45:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193865#M155925</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-03-27T18:45:43Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193866#M155926</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12747/swathidataengineer.html" nodeid="12747"&gt;@swathi thukkaraju&lt;/A&gt; &lt;/P&gt;&lt;P&gt;The pipe is a special character for splits, please use single quotes to split pipe-delimited strings:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val df1 = sc.textFile("testfile.txt").Map(_.split('|')).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Alternatively, you can use commas or another separator.&lt;/P&gt;&lt;P&gt;See the following StackOverflow post for more detail:&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/11284771/scala-string-split-does-not-work" target="_blank"&gt;https://stackoverflow.com/questions/11284771/scala-string-split-does-not-work&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 21:34:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193866#M155926</guid>
      <dc:creator>anarasimham</dc:creator>
      <dc:date>2018-03-27T21:34:43Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193867#M155927</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12747/swathidataengineer.html" nodeid="12747"&gt;@swathi thukkaraju&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;I think you are having header as first line in your file so we need to skip that header and then apply your case class to the file and also use escape for the &lt;STRONG&gt;special split character&lt;/STRONG&gt; because if you specify split then spark takes as&lt;STRONG&gt; regex&lt;/STRONG&gt; character for&lt;STRONG&gt; | &lt;/STRONG&gt;regex matching character would be &lt;STRONG&gt;\\|&lt;/STRONG&gt;. &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;my input file:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;name,age,state
swathi|23|us
srivani|24|UK
ram|25|London&lt;/PRE&gt;
&lt;PRE&gt; case class schema(name:String,age:Int,brand_code:String)
val rdd = sc.textFile("file://&amp;lt;local-file-path&amp;gt;/test.csv") (or) val rdd = sc.textFile("/test.csv") //for hadoop file
 val header = rdd.first() 
 val data = rdd.filter(row =&amp;gt; row != header)
 val df1 = data.map(_.split("\\|")).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&amp;lt;br&amp;gt;&lt;/PRE&gt;&lt;P&gt;(or)&lt;/P&gt;&lt;PRE&gt;case class schema(name:String,age:Int,brand_code:String) &amp;lt;br&amp;gt;val rdd = sc.textFile("file://&amp;lt;local-file&amp;gt;/test.csv") //for local file (or)  val rdd = sc.textFile("/test.csv") //for hadoop file
 val rdd1= rdd.mapPartitionsWithIndex { (idx, iter) =&amp;gt; if (idx == 0) iter.drop(1) else iter }
 val df1 = rdd1.map(_.split("\\|")).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;/PRE&gt;&lt;P&gt;in both ways we are skipping header line then applying our case class schema to the file once we apply case class and to df then we are going to have dataframe.&lt;/P&gt;&lt;P&gt;If you &lt;STRONG&gt;don't have header&lt;/STRONG&gt; then just load the file then apply split and case class and convert as dataframe&lt;/P&gt;&lt;PRE&gt;case class schema(name:String,age:Int,brand_code:String)
val rdd = sc.textFile("file://&amp;lt;local-file-path&amp;gt;/test.csv") (or) val rdd = sc.textFile("/test.csv") //for hadoop file
 val df1 = rdd.map(_.split("\\|")).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;/PRE&gt;&lt;P&gt;(or)&lt;/P&gt;&lt;PRE&gt;case class schema(name:String,age:Int,brand_code:String) &amp;lt;br&amp;gt;val rdd = sc.textFile("file://&amp;lt;local-file&amp;gt;/test.csv") //for local file (or)  val rdd = sc.textFile("/test.csv") //for hadoop file
 val df1 = rdd.map(_.split("\\|")).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Output from both ways:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; df1.show()
+-------+---+----------+
|   name|age|brand_code|
+-------+---+----------+
| swathi| 23|        us|
|srivani| 24|        UK|
|    ram| 25|    London|
+-------+---+----------+&lt;/PRE&gt;&lt;P&gt;How ever i have used csv package in spark 1.6.2 version it works fine, using this package is simple method than assigning case class . But you can choose either of those methods as per your requirements..!!&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 21:40:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193867#M155927</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-03-27T21:40:33Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193868#M155928</link>
      <description>&lt;P&gt;val rdd2 = rdd1.map(_.split("^"))&lt;/P&gt;&lt;P&gt;rdd2.collect&lt;/P&gt;&lt;P&gt;res16: Array[Array[String]] = Array(Array(OAP^US^xxahggv), Array(MNY^US^sfskdgsjkg), Array(ESS^US^fxjshgg))&lt;/P&gt;&lt;P&gt;it is not split  well is the issue i not getting &lt;/P&gt;&lt;P&gt;can show me the syntax am not able to find&lt;/P&gt;&lt;P&gt;thanks in advance&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 16:47:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193868#M155928</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-28T16:47:11Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193869#M155929</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/178742/how-to-read-schema-of-csv-file-and-according-to-co.html?childToView=182909#"&gt;@swathi thukkaraju&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;i didn't get the question, but if you are having a file with caret(^) as delimiter then we need to escape that caret with two back slashes as caret(^) is an special character in regex(line start position)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Input file:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;name^age^state
swathi^23^us
srivani^24^UK
ram^25^London&lt;/PRE&gt;&lt;PRE&gt;scala&amp;gt; case class schema(name:String,age:Int,brand_code:String)
scala&amp;gt; val rdd = sc.textFile("file://&amp;lt;file-path&amp;gt;/test1.csv")
scala&amp;gt; val rdd1= rdd.mapPartitionsWithIndex { (idx, iter) =&amp;gt; if (idx == 0) iter.drop(1) else iter }
scala&amp;gt; val df1 = rdd1.map(_.split("\\^")).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;BR /&gt;(or)&lt;BR /&gt;scala&amp;gt; val df1 = rdd1.map(_.split('^')).map(x=&amp;gt; schema(x(0).toString,x(1).toInt,x(2).toString)).toDF()&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Output:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; df1.show()&lt;BR /&gt;+-------+---+----------+
|   name|age|brand_code|
+-------+---+----------+
| swathi| 23|        us|
|srivani| 24|        UK|
|    ram| 25|    London|
+-------+---+----------+&lt;/PRE&gt;&lt;P&gt;if you are still facing issues then please share your input data, script that you have prepared and the expected output. so that its easily understand the root cause of the issue..!!&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 20:22:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193869#M155929</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-03-28T20:22:44Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193870#M155930</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12747/swathidataengineer.html" nodeid="12747"&gt;@swathi thukkaraju&lt;/A&gt;&lt;P&gt;You can do it without using CSV package. Use the following code.&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{IntegerType,StringType,StructField,StructType}


val schema =new StructType().add(StructField("name",StringType,true)).add(StructField("age",IntegerType,true)).add(StructField("state",StringType,true))


val data = sc.textFile("/user/206571870/sample.csv")
val header = data.first()
val rdd = data.filter(row =&amp;gt; row != header) 
val rowsRDD = rdd.map(x =&amp;gt; x.split(",")).map(x =&amp;gt; Row(x(0),x(1).toInt,x(2)))
val df = sqlContext.createDataFrame(rowsRDD,schema)&lt;/PRE&gt;&lt;P&gt;After this, do &lt;/P&gt;&lt;PRE&gt;df.show&lt;/PRE&gt;&lt;P&gt;and you will be able to see your data in a relational format.&lt;/P&gt;&lt;P&gt;Now you can fire whatever queries you want to fire on your "&lt;EM&gt;DataFrame&lt;/EM&gt;". For example, filtering based on state and saving on HDFS etc.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;PS - If you want to persist your DataFrame as a CSV file, spark 1.6 DOES NOT support it out of the box, you either need to convert it to RDD, then save or use the CSV package from DataBricks.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Let know if that helps!&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 22:26:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193870#M155930</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-03-28T22:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193871#M155931</link>
      <description>&lt;P&gt;I did but if my data have null values while loading my data into rdd this showing arrayoutof bound exception 88 &lt;/P&gt;&lt;P&gt;That data has 142 fields some null values inside that file how I can hanlde &lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2018 09:28:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193871#M155931</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-29T09:28:09Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193872#M155932</link>
      <description>&lt;P&gt;val rdd2 = rddwithheader.mapPartitionsWithIndex { (idx, iter) =&amp;gt; if (idx == 0) iter.drop(1) else iter }&lt;/P&gt;&lt;P&gt;val rowrdd3 = rdd2.map(_.split("\\|")).map(p=&amp;gt;schema(p(0),p(1), p(2), p(3), p(4), p(5), p(6), p(7), p(8), p(9), p(10), p(11), p(12), p(13), p(14), p(15), p(16), p(17), p(18), p(19), p(20), p(21), p(22), p(23), p(24), p(25), p(26), p(27), p(28), p(29), p(30), p(31), p(32), p(33), p(34), p(35), p(36), p(37), p(38), p(39), p(40), p(41), p(42), p(43), p(44), p(45), p(46), p(47), p(48), p(49), p(50), p(51), p(52), p(53), p(54), p(55), p(56), p(57), p(58), p(59), p(60), p(61), p(62), p(63), p(64), p(65), p(66), p(67), p(68), p(69), p(70), p(71), p(72), p(73), p(74), p(75), p(76), p(77), p(78), p(79), p(80), p(81), p(82), p(83), p(84), p(85), p(86), p(87), p(88), p(89), p(90), p(91), p(92), p(93), p(94), p(95), p(96), p(97), p(98), p(99), p(100), p(101), p(102), p(103), p(104), p(105), p(106), p(107), p(108), p(109), p(110), p(111), p(112), p(113), p(114), p(115), p(116), p(117), p(118), p(119), p(120), p(121), p(122), p(123), p(124), p(125), p(126), p(127), p(128), p(129), p(130), p(131), p(132), p(133), p(134), p(135), p(136), p(137), p(138), p(139), p(140), p(141), p(142)))&lt;/P&gt;&lt;P&gt;error: overloaded method value apply with alternatives: (fieldIndex: Int)org.apache.spark.sql.types.StructField &amp;lt;and&amp;gt; (names: Set[String])org.apache.spark.sql.types.StructType &amp;lt;and&amp;gt; (name: String)org.apache.spark.sql.types.StructField&lt;/P&gt;&lt;P&gt;other code shu wat u gave overloaded method value apply with alternatives: exception how to handle this&lt;/P&gt;&lt;P&gt;kindly help on this&lt;/P&gt;&lt;P&gt;thanks in advance&lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2018 11:34:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193872#M155932</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-29T11:34:52Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193873#M155933</link>
      <description>&lt;P&gt;input.data&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;ef47cd52f7ed4044148ab7b1cc897f55|TEST_F1|TEST_L1|7109 Romford Way||North Richland &lt;A href="mailto:Hills%7CTX%7C76182-5027%7C5027%7Ctest1498@yahoo.com"&gt;Hills|TX|76182-5027|5027|test1498@yahoo.com&lt;/A&gt;|||||MNY|USA|1989||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||N|N|N|N|N|N||||||||||||||||||||||||||||||||||||||||||||||||||||||| 556510f9cea2e32260eb913e976b7ef0|TEST_F2|TEST_L2|11 South &lt;A href="mailto:Rd%7C%7CChester%7CNJ%7C07930-2739%7C2739%7Ctest@embarqmail.com"&gt;Rd||Chester|NJ|07930-2739|2739|test@embarqmail.com&lt;/A&gt;|||||OAP|USA|1964|||||Female||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 91daac14d56047c45cb27227b46b8074|TEST_F3|TEST_L3|1724 Holly Ln||Pampa|TX|79065-&lt;/STRONG&gt;&lt;/EM&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;A href="mailto:4536%7C4536%7Ctest@sbcglobal.net"&gt;4536|4536|test@sbcglobal.net&lt;/A&gt;|||||OAP|USA|1941|||||Female|||||SKINTONE_LIGHT|||||||||||||||||||||EYECOLOR_BLUE|||||HAIRCOLOR_AUBURN|||||||||||||||||||||||||||EN|||N|Y|N|N|N||||INT_HAIR_GREY_COVERAGE,INT_HAIR_TRENDS||||||||||||||||||||||||||||||||||||||||||||||||||||&lt;/STRONG&gt;&lt;/EM&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;A href="mailto:4536%7C4536%7Ctest@sbcglobal.net"&gt;4536|4536|test@sbcglobal.net&lt;/A&gt;|||||OAP|USA|1941|||||Female|||||SKINTONE_LIGHT|||||||||||||||||||||EYECOLOR_BLUE|||||HAIRCOLOR_AUBURN|||||||||||||||||||||||||||EN|||N|Y|N|N|N||||INT_HAIR_GREY_COVERAGE,INT_HAIR_TRENDS||||||||||||||||||||||||||||||||||||||||||||||||||||&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2018 11:36:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193873#M155933</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-29T11:36:03Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193874#M155934</link>
      <description>&lt;P&gt;input data &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/64927-input.txt"&gt;input.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2018 11:44:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193874#M155934</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2018-03-29T11:44:05Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193875#M155935</link>
      <description>&lt;P&gt;Have you seen the filter condition in my answer above?&lt;/P&gt;&lt;PRE&gt;val rdd = data.filter(row =&amp;gt; row != header)&lt;/PRE&gt;&lt;P&gt;Now use such filter condition to filter your null records, if there are any, according to your use case.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2018 12:41:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193875#M155935</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-03-29T12:41:47Z</dc:date>
    </item>
    <item>
      <title>Re: how to read schema of csv file and according to column values and  we need to split the data into multiple file using scala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193876#M155936</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12747/swathidataengineer.html" nodeid="12747"&gt;@swathi thukkaraju&lt;/A&gt;&lt;P&gt;Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!&lt;/P&gt;</description>
      <pubDate>Sun, 01 Apr 2018 23:09:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/how-to-read-schema-of-csv-file-and-according-to-column/m-p/193876#M155936</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-04-01T23:09:16Z</dc:date>
    </item>
  </channel>
</rss>

