Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

XML to parquet

XML to parquet

New Contributor

I read some xml documents with this structure:

 

<Object Attr1="..." Attr2="..." Attr3="..." />

<Object Attr1="..." Attr2="..." Attr3="..." />

....

....

....

<Object Attr1="..." Attr2="..." Attr3="..." />

 

And I need to convert them to parquet.

I am trying with the ConvertRecord processor defining as Record Reader an XMLReader and as Record Writer an ParquetRecordSetWriter 1.11.4.

The XMLReader has the property "Expect Records as Array" set to false and the property "Schema Registry" set to an AvroSchemaRegistry 1.11.4 where I defined the schema as:

 

{
"type" : "record",
"name" : "Object",
"namespace" : "Object",
"fields" : [ {
    "name" : "Attr1", "type" : "string"
    }, {
    "name" : "Attr2", "type" : "string"
    }, {
    "name" : "Attr3", "type" : "string"
    }
    ]
}

 

How can I fix it?

 

Don't have an account?
Coming from Hortonworks? Activate your account here