I would like to know how people handle exceptions with etl - extract transform load . Specifically with spark and scala code where I would like to just ignore data which is in the improper format and therefore not write this data after opening up the file to my rdd. Which afterwards my rdd then gets converted to a dataframe which gets written to a hive table.
I would like to skip the row inside the code either with exception handling or with loops.
I know I could filter by rows if they dont have the right number of spaces or columns but I would like something more robust.