Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to convert CSV to pandas dataframe ?

How to convert CSV to pandas dataframe ?

New Contributor

Does anyone have tips to convert a CSV file to a pandas dataframe and write the dataframe into HDSF ?

I have tried the following:

GetFile -> ExecuteScript -> PutHDFS

GetFile -> ExecuteStreamCommand -> PutHDFS

#!/usr/bin/env python3
import pandas
import sys

def csv_to_pandas(csv_file):
    return pandas.read_csv(csv_file)
flow_file = session.get()


new_flow_file = csv_to_pandas(flow_file)


session.transfer(new_flow_file, REL_SUCCESS)


In both cases my script can't see the session var.

NameError: name 'session' is not defined.

Nifi et al running on Ubuntu 18 server.

Any tips are welcome :)

1 REPLY 1
Highlighted

Re: How to convert CSV to pandas dataframe ?

New Contributor

To convert CSV to Pandas DataFrame, use pd read_csv() method.

 

It will take an external CSV file and convert that file into the DataFrame.

 

Now, to convert DataFrame to csv , use to_csv() method.

Don't have an account?
Coming from Hortonworks? Activate your account here