Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

I am trying to ingest JSON file into hive. If the the JSON file is single line or single line at row level im able to handle it through get_json_object or lateral view. Is there any other tool or way to convert multi line JSON file into single line JSON file? Could you explain or provide link.



@Bala Vignesh Nagamuthu Venkatesan You could use Apache NiFi for handling the ingestion of the JSON file and transformation of it. For example if you have a multi-line json file like this:

   "book": [
         "language": "Java",
         "edition": "third",
         "author": "Herbert Schildt"
         "language": "C++",
         "edition": "second",
         "author": "E.Balagurusamy"

You could use a flow like the following to split the json, and then merge it back.


The output of the flow above using the sample multi-line json provided is this, a single line JSON entry. Note the below actually is a single line, but the formatting online makes it looks like two.

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}{"_key":"07","language":"C++","edition":"second","author":"E.Balagurusamy"}

You could do a little more transformation on this using some of the other JSON processors available, or using the JSON-TO-JSON Jolt transformation, great explanation by @Yolanda M. Davis here:

If you post the structure of your JSON i'm sure we can get it to work for you either using HIVE/Hadoop or NiFi.


🙂 And of course you could put to HDFS using PutHDFS.

Instead of writing all to single line JSON can this be broken down to write each at a single line such as

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}


I tried using a replace text processor with append('\n') but that didnt seem to work.

Use the Jolt processor to parse out the JSON to a single line. Here is an example NiFi DataFlow showing how to configure the Jolt processor to parse out JSON fields: