Support Questions
Find answers, ask questions, and share your expertise

Handling exceptions in scala with fixed length files etl if the format changes

Handling exceptions in scala with fixed length files etl if the format changes

I would like to know how people handle exceptions with etl - extract transform load . Specifically with spark and scala code where I would like to just ignore data which is in the improper format and therefore not write this data after opening up the file to my rdd. Which afterwards my rdd then gets converted to a dataframe which gets written to a hive table.

I would like to skip the row inside the code either with exception handling or with loops.

I know I could filter by rows if they dont have the right number of spaces or columns but I would like something more robust.

14260-sparkcode.jpg

1 REPLY 1

Re: Handling exceptions in scala with fixed length files etl if the format changes

@elliot gimple

I think any good suggestion would require knowing your ETL process, but a couple of general notes:

- Use a filter somewhere (c.filter(...), for(i <- c; if something) yield {}, etc.)

- Consider using monads (*gulp*, I know). Maybe something as simple as Option/Either.map would work in your case.