- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to read fixed length files in Spark
- Labels:
-
Apache Spark
Created ‎08-04-2016 04:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a fixed length file ( a sample is shown below) and I want to read this file using DataFrames API in Spark(1.6.0).
56 apple TRUE 0.56 45 pear FALSE1.34 34 raspberry TRUE 2.43 34 plum TRUE 1.31 53 cherry TRUE 1.4 23 orange FALSE2.34 56 persimmon FALSE23.2
The fixed width of each columns are 3, 10, 5, 4
Please suggest your opinion.
Created ‎08-04-2016 06:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Under the assumption that the file is Text and each line represent one record, you could read the file line by line and map each line to a Row. Then you can create a data frame form the RDD[Row]
something like
sqlContext.createDataFrame(sc.textFile("<file path>").map { x => getRow(x) }, schema)
I have the below basic definition for creating the Row from your line using substring. But you can use your own implementation.
def getRow(x : String) : Row={ val columnArray = new Array[String](4) columnArray(0)=x.substring(0,3) columnArray(1)=x.substring(3,13) columnArray(2)=x.substring(13,18) columnArray(3)=x.substring(18,22) Row.fromSeq(columnArray) }
If the records are not delimited by a new line, you may need to use a FixedLengthInputFormat and read the record one at a time and apply the similar logic as above. The fixedlengthinputformat.record.length in that case will be your total length, 22 in this example. Instead of textFile, you may need to read as sc.newAPIHadoopRDD
Created ‎08-08-2016 09:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amit, I am using 1.6.0 that is installed in quick start vm from CDH 5.5.7
Created ‎07-12-2017 10:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

- « Previous
-
- 1
- 2
- Next »