Support Questions

anandt8dev · ‎12-01-2017

I'm using ConvertRecord processor to convert incoming Avro format records to Json records using Avro schema

Each value of avro record should be mapped to JsonArray element in the Json record and _not directly as Json element

Incoming Avro records are like below: (formatted) I've attached file with such avro-binary records

56702205798738-7d4a2b1c-bf7c-4aad-936f-b6f894ad117.txt

[ {
  "ID" : 1,
  "Name" : "Jackey",
  "Age" : 23,
  "JoinedOn" : "1990-11-11"
}, {
  "ID" : 2,
  "Name" : "Kiwi",
  "Age" : 32,
  "JoinedOn" : "1990-11-11"
}]

Avro Schema:

{
    "type": "record",
    "name": "EmployeeRecord",
    "fields": [
        {
            "name": "timestamp",
            "type": "long",
            "default": 1507541500000
        },
        {
            "name": "database",
            "type": "string",
            "default": "testdb"
        },
        {
            "name": "table_name",
            "type": "string",
            "default": "employees"
        },
        {
            "name": "columns",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "columnValues",
                    "fields": [
                        {
                            "name": "ID",
                            "type": "long",
                            "default": 0
                        },
                        {
                            "name": "Name",
                            "type": "string",
                            "default": "defaultName"
                        },
                        {
                            "name": "Age",
                            "type": [
                                "null",
                                "long"
                            ]
                            "default": 0
                        },
                        {
                            "name": "JoinedOn",
                            "type": "string"
                            "default": "defaultDate"
                        }
                    ]
                }
            }
        }
    ]
}

What I think I should get for each record:

 {
  "timestamp" : 1507541500000,
  "database" : "testdb",
  "table_name" : "employees",
  "columns" : [ 
	{
    	 "ID" : 1,
         "Name":  "Jackey",
 	 "Age" : 23,
 	 "JoinedOn" : "1990-11-11"
	}]
}

But, this is what I am getting: (columns : null)

 {
  "timestamp" : 1507541500000,
  "database" : "testdb",
  "table_name" : "employees",
  "columns" : null
}

Can someone comment what's going wrong here?

anandt8dev · ‎12-02-2017

@Matt Burgess Hi Matt, can you please help? Basically I'm trying to read from short-avro schema and write using complex-avro schema. Complex-avro schema has an array in which I want to map the short-avro fields, is this doable?

mburgess · ‎12-02-2017

In this case, you have two different schemas, one for the input and one for the output. The one you list above is the output schema, but you will also need an input schema such as:

{
    "type": "record",
    "name": "employeeInfo",
    "fields": [
        { "name": "ID", "type": "long", "default": 0 },
        { "name": "Name", "type": "string", "default": "defaultName" },
        { "name": "Age", "type": ["null", "long"], "default": 0 },
        { "name": "JoinedOn", "type": "string", "default": "defaultDate" }
    ]
}

Because your input and output schemas are not the same, ConvertRecord is not the right choice for this operation; the doc says "The Reader and Writer must be configured with "matching" schemas. By this, we mean the schemas must have the same field names".

I believe you will need two UpdateRecords in a row:

1) Move the employee info from the top-level into the array (using a Replacement Value Strategy of "Record Path Value"), with an example user-defined property for the ID field having property name "/columns/ID" and value "/ID".

2) I'm not sure where you are getting your values for timestamp, database, and table_name, but you can probably set those with another UpdateRecord, this time with a Replacement Value Strategy of "Literal Value", possibly using Expression Language to get the values from flow file attributes ("/table_name" = "${table.name}" for example), or functions ("/timestamp" = "${now():toNumber()}" for example).

Cloudera Community

Support Questions

Avro Schema for converting to JsonArray returning 'null'