Posts: 55
Registered: ‎09-17-2013

how to create parquet file programatically from xml/json input file without .avsc and without impala

I want to convert my input file (xml/json) to parquet. I have already have one solution that works with spark, and creates required parquet file.


However, due to other client requirements, i might need to create a solution that does not involve hadoop eco system such as hive, impala, spark or mapreduce.


And, Kite SDK is using .avsc file to create parquet data, kindly correct me if i am wrong. I might be short sighted but, looks like it needs avro schema file.


So, is there any library that can create parquet files programatically from self explanatory files such as xml or json.?


Note: If it feels like not a proper approach, i would like to understand the reasons why it is not a recommended approach, so that i can earn some knowledge or understand the areas that i might have missed.

The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at