Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI Executestreamcommand with python script on HDFS

NIFI Executestreamcommand with python script on HDFS

New Contributor

Hi All,

I tried to convert sas7bdat file into delimited txt file. so have created a python script to convert and its working fine.

  • python and Nifi is installed in edge node of local system
  • Before conversion size of compressed sas7bdat file is 3GB
  • After conversion text file size is around 6GB
  • It takes 3.30 hrs for file conversion since executed in local system

I configured NIFI flow like listFile ->ExecutestreamCommand processor to execute python file

Is there any way to execute script in HDFS itself?.

Is there any alternative way to speedup the conversion?

Python script look like this:

#!/usr/bin/env python

import sys

import os

import numpy

import sas7bdat

import pandas

# Pass ${absolute.path}/${filename} as a command line argument

inputFile = sys.argv[1] outpath =sys.argv[2]

filName = os.path.splitext(os.path.basename(inputFile))[0]

outfile = os.path.join(outpath+filName+'.txt')

from sas7bdat import SAS7BDAT

with SAS7BDAT(inputFile) as f:

df = f.to_data_frame()

df.to_csv(outfile,index=False, sep='|')

3 REPLIES 3

Re: NIFI Executestreamcommand with python script on HDFS

Rising Star

Try to execute

tailf /proc/<pid_execution_script>/fd/2

pid_execution_scipt --> on the server where run NiFi you look for the process id, not the pid of NiFi service.

Sometimes ExecutionScript ddoes not recivie the termination of the script. I do not know how to solve this :(

Re: NIFI Executestreamcommand with python script on HDFS

New Contributor

Can we perform the same task using the Execute Script Processor ?

Re: NIFI Executestreamcommand with python script on HDFS

New Contributor

What if the incoming flowfile is the sas7bdat file that we want to convert ? What changes we would be expecting for that ?