Support Questions
Find answers, ask questions, and share your expertise

How read Multiple delimiter CSV file in spark Scala 1.6

How read Multiple delimiter CSV file in spark Scala 1.6

I have data like this below ; delimited file without using CSV and databricks packages

Lghhjj^country:US;name:swathi;age:;Dept:engineer

Ghahjah^country:India;name:shshsh;age:48;Dept:management

How to read this type file and also I need to search string distinct country collect all rows different country and save into different folder

Us country rows into one folder or file

India country rows into one folder or file

Save as same format ; key and value without any modification on data

Please help me out how to do this

2 REPLIES 2
Highlighted

Re: How read Multiple delimiter CSV file in spark Scala 1.6

I'd consider setting the delimiter to the main one ";", then use string.indexof/string.substring to split the field values up, and emit the values into some structure which isolates each one, preferably ORC. Once you've saved it as ORC, then you can do queries over that, again, into a structured format. For final conversion into your own chosen standard, well, unless you are going to implement your own format (worth considering, actually), just do a query which selects the columns you want and then just use String.format() to build up the strings. You'll need a story for null there though.

Finally, while ORC is a great format for querying, you might want to think about Avro as a simple data exchange format as it includes schemas and is straightforward to parse.

Highlighted

Re: How read Multiple delimiter CSV file in spark Scala 1.6

Can you provide example code for this for this just again I need to store same data in same form by splitting data into country wise storing in different ccountry folder with country specific data using spark scala

Don't have an account?