Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to convert CSV to pandas dataframe ?

Highlighted

How to convert CSV to pandas dataframe ?

New Contributor

Does anyone have tips to convert a CSV file to a pandas dataframe and write the dataframe into HDSF ?

I have tried the following:

GetFile -> ExecuteScript -> PutHDFS

GetFile -> ExecuteStreamCommand -> PutHDFS

#!/usr/bin/env python3
import pandas
import sys

def csv_to_pandas(csv_file):
    return pandas.read_csv(csv_file)
flow_file = session.get()


new_flow_file = csv_to_pandas(flow_file)


session.transfer(new_flow_file, REL_SUCCESS)


In both cases my script can't see the session var.

NameError: name 'session' is not defined.

Nifi et al running on Ubuntu 18 server.

Any tips are welcome :)

Don't have an account?
Coming from Hortonworks? Activate your account here