Support Questions
Find answers, ask questions, and share your expertise

Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

 
7 REPLIES 7

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Super Guru
@Vengai Magan

Assuming that you are having csv file in some directory and you are having hive external table created already

Flow:

1.ListFile //list the files in the directory
2.FetchFile //Fetch the listed files from the directory
3.Convert Record //Record Reader Csv Reader, Record Writer Avro Set Writer --> converting Csv data into Avro 4.PutHDFS //Put the avro data file into hive external table location.

78608-flow.png

Refer to this and this links to configure convert record processor.

In addition old way of converting csv to avro is described here.

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Super Guru
@Vengai Magan

As you are using CSV Reader there are lot of ways to access(extract) the Schema.

1.Use 'Schema Text' Property

2.Use 'Schema Name' Property

3.Use String Fields From Header

Assume my csv data is like below

id,name,dept
1,foo,sales
2,bar,rnd

1.Use 'Schema Text' Property

By using this access strategy we are defining schema in Schema Text property of CSVReader controller service

78640-using-schematext.png

Treat the first line as header property to true as we are having header included in the csv file

AvroSetWriter Controller service configs:

78641-avrosetwriter.png

Output will be avro data file having embed avro schema in it and id field type is integer.

2.Use 'Schema Name' Property:

Add the schema.name attribute to the flowfile and use avro schema registry add new property with the same schema name with value as avro schema in it (or) define as a variable with the avro schema and use the variable name in schema name property.

{
    "namespace": "nifi",
    "name": "sch",
    "type": "record",
    "fields": [
        { "name": "id", "type": ["null","int"]},
        { "name": "name", "type": ["null","string"]},
        { "name": "dept", "type": ["null","string"]}
    ]
}

3.Use String Fields From Header:

CsvReader controller service configs:

78644-use-stringfields.png

We are using string type for the all fields from the header

AvroSetWriter Configs:

Will be same as mentioned above

But the id field type will be string in avro data file because we are using string type for all the header fields.

i suggest using Convert Record processor instead of ConvertCsvToAvro processor as convert record processor designed to work with chunks of data which results significantly good performance compared to ConvertCsvToAvro processor.

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Hi @shu, which csvreader controller service to set for files without header. I really thank you for your help as I'm new to Nifi.

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Super Guru
@Vengai Magan

If there is no header in the csv file then we need to define matching avro schema in CsvReader so that convert record processor can read the incoming csv data.

80381-csvreader.png

Treat First Line as Header

false

By using this property we are going to read all the records from csv file with the schema that defined in Schema Text property.

assume my incoming csv data is like below without header

1,foo,sales
2,bar,rnd
{
    "namespace": "nifi",
    "name": "sch",
    "type": "record",
    "fields": [
        { "name": "id", "type": ["null","int"]},
        { "name": "name", "type": ["null","string"]},
        { "name": "dept", "type": ["null","string"]}
    ]
}

Avro Schema for the incoming csv data is mentioned above.

i have attached sample template convertrecord-199607.xml save the template and upload to your nifi instance and change the csvreader controller service(avro schema) according to your incoming csv data.

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Hi @Shu, Thanks and I'll follow the steps mentioned and get back

Re: Please throw some light on the best way to load csv into Hive external table in Avro format using Nifi

Hi @Shu, the process loads the avro data file similar to inferavro & csvtoavro (old way). Please let me know how to find the avro schema. Should we extract schema and then use it?