<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: how to read fixed length files in Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/315040#M36978</link>
    <description>&lt;P&gt;Sorry it's max 8060 characters&lt;/P&gt;</description>
    <pubDate>Wed, 21 Apr 2021 10:45:50 GMT</pubDate>
    <dc:creator>RameshMishra</dc:creator>
    <dc:date>2021-04-21T10:45:50Z</dc:date>
    <item>
      <title>how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165370#M36967</link>
      <description>&lt;P&gt;I have a fixed length file ( a sample is shown below) and I want to read this file using DataFrames API in Spark(1.6.0).&lt;/P&gt;&lt;PRE&gt;56 apple     TRUE 0.56
45 pear      FALSE1.34
34 raspberry TRUE 2.43
34 plum      TRUE 1.31
53 cherry    TRUE 1.4 
23 orange    FALSE2.34
56 persimmon FALSE23.2&lt;/PRE&gt;&lt;P&gt;The fixed width of each columns are  3, 10, 5, 4&lt;/P&gt;&lt;P&gt;Please suggest your opinion.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 23:51:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165370#M36967</guid>
      <dc:creator>Alexraj84</dc:creator>
      <dc:date>2016-08-04T23:51:23Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165371#M36968</link>
      <description>&lt;P&gt;Under the assumption that the file is Text and each line represent one record, you could read the file line by line and map each line to a Row.  Then you can create a data frame form the RDD[Row]&lt;/P&gt;&lt;P&gt;something like &lt;/P&gt;&lt;PRE&gt;sqlContext.createDataFrame(sc.textFile("&amp;lt;file path&amp;gt;").map { x =&amp;gt; getRow(x) }, schema)&lt;/PRE&gt;&lt;P&gt;I have the below basic definition for creating the Row from your line using substring. But you can use your own implementation.&lt;/P&gt;&lt;PRE&gt;def getRow(x : String) : Row={    
val columnArray = new Array[String](4)
columnArray(0)=x.substring(0,3)
columnArray(1)=x.substring(3,13)
columnArray(2)=x.substring(13,18)
columnArray(3)=x.substring(18,22)
Row.fromSeq(columnArray)  
}
&lt;/PRE&gt;&lt;P&gt;If the records are not delimited by a new line, you may need to use a FixedLengthInputFormat and read the record one at a time and apply the similar logic as above. The fixedlengthinputformat.record.length in that case will be your total length, 22 in this example. Instead of textFile, you may need to read as sc.newAPIHadoopRDD&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 01:54:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165371#M36968</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-05T01:54:16Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165372#M36969</link>
      <description>&lt;P&gt;Thanks Arun however I have a problem while creating getRow function. Not sure what exactly does it refers to.&lt;/P&gt;&lt;P&gt;Here is the error&lt;/P&gt;&lt;PRE&gt;&amp;lt;console&amp;gt;:26: error: not found: type Row
         def getRow(x : String) : Row={
                                  ^
&amp;lt;console&amp;gt;:32: error: not found: value Row
       Row.fromSeq(columnArray)&lt;/PRE&gt;</description>
      <pubDate>Sat, 06 Aug 2016 00:33:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165372#M36969</guid>
      <dc:creator>Alexraj84</dc:creator>
      <dc:date>2016-08-06T00:33:49Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165373#M36970</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/1505/alexraj84.html" nodeid="1505"&gt;@Alex Raj&lt;/A&gt; 
Row is org.apache.spark.sql.Row. You need to add the import statement.&lt;/P&gt;</description>
      <pubDate>Sat, 06 Aug 2016 01:22:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165373#M36970</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-06T01:22:46Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165374#M36971</link>
      <description>&lt;P&gt;Great, that fixes the problem but another arises.&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x =&amp;gt; getRow(x) }, schema)
&amp;lt;console&amp;gt;:31: error: package schema is not a value
              sqlContext.createDataFrame(sc.textFile("/user/cloudera/data/fruit_fixedwidth.txt").map { x =&amp;gt; getRow(x) }, schema)
                                                                                                                         ^&lt;/PRE&gt;&lt;P&gt; I am really getting excited now. What is the schema all about in this context?&lt;/P&gt;</description>
      <pubDate>Sat, 06 Aug 2016 01:49:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165374#M36971</guid>
      <dc:creator>Alexraj84</dc:creator>
      <dc:date>2016-08-06T01:49:15Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165375#M36972</link>
      <description>&lt;P&gt;Hi Alex, Can you clarify which version of Spark you are using?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 16:10:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165375#M36972</guid>
      <dc:creator>anandi</dc:creator>
      <dc:date>2016-08-08T16:10:53Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165376#M36973</link>
      <description>&lt;P&gt;Hi Amit, I am using 1.6.0 that is installed in quick start vm from CDH 5.5.7&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 16:42:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165376#M36973</guid>
      <dc:creator>Alexraj84</dc:creator>
      <dc:date>2016-08-08T16:42:17Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165377#M36974</link>
      <description>&lt;P&gt;Well, schema is somewhat like the header. say id, fruitName, isAvailable, unitPrice in your case. You can specify the schema programmatically. Have a quick reference &lt;A target="_blank" href="http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema"&gt;here&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 19:26:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165377#M36974</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-08T19:26:59Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165378#M36975</link>
      <description>&lt;P&gt;You can do something like &lt;/P&gt;&lt;PRE&gt;val schemaString = "id,fruitName,isAvailable,unitPrice" 
val fields = schemaString.split(",")
  .map(fieldName =&amp;gt; StructField(fieldName, StringType, nullable = true)) 
val schema = StructType(fields)&lt;/PRE&gt;</description>
      <pubDate>Mon, 08 Aug 2016 22:44:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165378#M36975</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-08T22:44:54Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165379#M36976</link>
      <description>&lt;P&gt;I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: &lt;A href="https://github.com/atais/Fixed-Length"&gt;https://github.com/atais/Fixed-Length&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2017 17:12:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/165379#M36976</guid>
      <dc:creator>atais_jr</dc:creator>
      <dc:date>2017-07-12T17:12:55Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/315038#M36977</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;in scala dataframe ,I want to read row level total record size till maximum 1060 byte. as SQL table have also max length of record as 1060.do we have function which we can apply on scala data frame to read the file row level record only till 1060 character and extra record can be skip.Please suggest&lt;/P&gt;</description>
      <pubDate>Wed, 21 Apr 2021 10:31:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/315038#M36977</guid>
      <dc:creator>RameshMishra</dc:creator>
      <dc:date>2021-04-21T10:31:14Z</dc:date>
    </item>
    <item>
      <title>Re: how to read fixed length files in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/315040#M36978</link>
      <description>&lt;P&gt;Sorry it's max 8060 characters&lt;/P&gt;</description>
      <pubDate>Wed, 21 Apr 2021 10:45:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-read-fixed-length-files-in-Spark/m-p/315040#M36978</guid>
      <dc:creator>RameshMishra</dc:creator>
      <dc:date>2021-04-21T10:45:50Z</dc:date>
    </item>
  </channel>
</rss>

