How to convert CSV to pandas dataframe ?

Does anyone have tips to convert a CSV file to a pandas dataframe and write the dataframe into HDSF ?

I have tried the following:

GetFile -> ExecuteScript -> PutHDFS

GetFile -> ExecuteStreamCommand -> PutHDFS

#!/usr/bin/env python3
import pandas
import sys

def csv_to_pandas(csv_file):
    return pandas.read_csv(csv_file)
flow_file = session.get()

new_flow_file = csv_to_pandas(flow_file)

session.transfer(new_flow_file, REL_SUCCESS)

In both cases my script can't see the session var.

NameError: name 'session' is not defined.

Nifi et al running on Ubuntu 18 server.

Any tips are welcome 🙂


To convert CSV to Pandas DataFrame, use pd read_csv() method.


It will take an external CSV file and convert that file into the DataFrame.


Now, to convert DataFrame to csv , use to_csv() method.

