Support Questions
Find answers, ask questions, and share your expertise

NIFI Executestreamcommand with python script on HDFS

New Contributor

Hi All,

I tried to convert sas7bdat file into delimited txt file. so have created a python script to convert and its working fine.

  • python and Nifi is installed in edge node of local system
  • Before conversion size of compressed sas7bdat file is 3GB
  • After conversion text file size is around 6GB
  • It takes 3.30 hrs for file conversion since executed in local system

I configured NIFI flow like listFile ->ExecutestreamCommand processor to execute python file

Is there any way to execute script in HDFS itself?.

Is there any alternative way to speedup the conversion?

Python script look like this:

#!/usr/bin/env python

import sys

import os

import numpy

import sas7bdat

import pandas

# Pass ${absolute.path}/${filename} as a command line argument

inputFile = sys.argv[1] outpath =sys.argv[2]

filName = os.path.splitext(os.path.basename(inputFile))[0]

outfile = os.path.join(outpath+filName+'.txt')

from sas7bdat import SAS7BDAT

with SAS7BDAT(inputFile) as f:

df = f.to_data_frame()

df.to_csv(outfile,index=False, sep='|')


Expert Contributor

Try to execute

tailf /proc/<pid_execution_script>/fd/2

pid_execution_scipt --> on the server where run NiFi you look for the process id, not the pid of NiFi service.

Sometimes ExecutionScript ddoes not recivie the termination of the script. I do not know how to solve this 😞

New Contributor

Can we perform the same task using the Execute Script Processor ?

New Contributor

What if the incoming flowfile is the sas7bdat file that we want to convert ? What changes we would be expecting for that ?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.