Created 11-08-2018 02:03 PM
Created 11-13-2018 06:06 AM
This would be a sample code to covert csv to json using pyspark.
df = spark.read.format("CSV").option("header","true").load("file:///tmp/sample.csv") df.repartition(1).toJSON(use_unicode=True).saveAsTextFile("file:///tmp/sample_out")
Hope this helps.
Created 12-30-2018 06:38 AM
@hema moger, Do accept this answer and close this thread if it helped in addressing your query.
Created 11-13-2018 06:30 PM
In spark1.6, you can use databricks custom csv formatter to load csv into a data frame and write it to a json. You can read this readme to achieve that
https://github.com/databricks/spark-csv#csv-data-source-for-apache-spark-1x
In spark2+, spark itself providing a csv loader to create a data frame and write it to a whatever format (json, parquet and orc) you want
https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv
https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.json