Have a data file like this:
Columns and associated data elements could vary. Hence, I want to store it as key value pair, something like this:
If you notice, I've ignored the 1st three lines before reading the text file. Also, I've ignored "/pqr" as well before reading the column names, units and actual data.
Any directions or thoughts how could I achieve this using pyspark?
My idea is that if I could convert the incoming data file like this using pyspark, then I could have a Hive layer on top of it and read it as a string.
I couldn't define the columns statically in Hive because number of columns and their order could vary with each file.