About NoahB

Kezia · ‎07-16-2020

Hello, here is a way to do it using pyspark, it may be not optimal. I used this csv to test my code column1,column2,column3 row1-1|row1-2|row1-3 row2-1|row2-2|row2-3 row3-1|row3-2|row3-3 Load the header only, giving the dataframe structure header_dataframe = spark.read.format("csv").option("header", "true").load('/tmp/test.csv').limit(0) +-------+-------+-------+ |column1|column2|column3| +-------+-------+-------+ +-------+-------+-------+ Load the data as RDD, remove the first line and convert it to dataframe data_rdd = sc.textFile('/tmp/test.csv') header_row =data_rdd.first() data_rdd = data_rdd.filter(lambda row:row != header_row) data_dataframe = data_rdd.map(lambda x: x.split("|")).toDF() +------+------+------+ | _1| _2| _3| +------+------+------+ |row1-1|row1-2|row1-3| |row2-1|row2-2|row2-3| |row3-1|row3-2|row3-3| +------+------+------+ Append the dataframe containing the data to the dataframe holding the structure dataframe = header_dataframe.union(data_dataframe) +-------+-------+-------+ |column1|column2|column3| +-------+-------+-------+ | row1-1| row1-2| row1-3| | row2-1| row2-2| row2-3| | row3-1| row3-2| row3-3| +-------+-------+-------+

Online	Offline
Last Visited	‎07-20-2020 06:55 PM

Member Since	‎07-15-2020 07:55 AM
Last Visited	‎07-20-2020 06:55 PM
Posts	1

Cloudera Community

Re: How can I replace characters in the header of ...