Does anyone have tips to convert a CSV file to a pandas dataframe and write the dataframe into HDSF ?
I have tried the following:
GetFile -> ExecuteScript -> PutHDFS
GetFile -> ExecuteStreamCommand -> PutHDFS
#!/usr/bin/env python3
import pandas
import sys
def csv_to_pandas(csv_file):
return pandas.read_csv(csv_file)
flow_file = session.get()
new_flow_file = csv_to_pandas(flow_file)
session.transfer(new_flow_file, REL_SUCCESS)
In both cases my script can't see the session var.
NameError: name 'session' is not defined.
Nifi et al running on Ubuntu 18 server.
Any tips are welcome 🙂