Member since
02-26-2017
12
Posts
3
Kudos Received
1
Solution
04-04-2017
06:16 AM
Hi Matt, I created a simple implementation of a NiFi OutputStreamCallback in Python ExecuteScript and successfully transferred data to next processor in my flow. However, when I try to enrich the code and add desired logic to that, data is simply not transferred. I don't see any compilation error and it even logs suggest that ExecuteScript ran without any error. Appreciate if you could help. Below is the snippet which simply calls function and write data to the flow file. import urllib2
import json
import datetime
import csv
import time
import sys
import traceback
from org.apache.nifi.processor.io import OutputStreamCallback
from org.python.core.util import StringUtil
class WriteContentCallback(OutputStreamCallback):
def __init__(self, content):
self.content_text = content
def process(self, outputStream):
try:
outputStream.write(StringUtil.toBytes(self.content_text))
except:
traceback.print_exc(file=sys.stdout)
raise
page_id = "dsssssss"
access_token = "sdfsdfsf%sdfsdf"
def scrapeFacebookPageFeedStatus(page_id, access_token):
flowFile = session.create()
flowFile = session.write(flowFile, WriteContentCallback("Hello there this is my data"))
flowFile = session.write()
session.transfer(flowFile, REL_SUCCESS)
has_next_page = False
num_processed = 0 # keep a count on how many we've processed
scrape_starttime = datetime.datetime.now()
while has_next_page:
print "Scraping %s Page: %s\n" % (page_id, scrape_starttime)
has_next_page = False
print "\nDone!\n%s Statuses Processed in %s" % \
(num_processed, datetime.datetime.now() - scrape_starttime)
if __name__ == '__main__':
scrapeFacebookPageFeedStatus(page_id, access_token)
flowFile = session.create()
flowFile = session.write(flowFile, WriteContentCallback("and your data"))
session.transfer(flowFile, REL_SUCCESS)
... View more
03-23-2017
09:30 AM
Hi @Vasilis Vagias Great article. I am scraping facebook page contents using python and want to used executescript processor to get all the posts returned by python function and pass it on to solr processor for indexing. Currently I am writing the contents returned by facebook in a file and I want to put those contents to the output stream instead and pass on to next processsor.Can you please share the steps with regards to the example given in your article? Can I used outputstream object in any python function and use it for writing records; I don't think creating inner class is mandatory. Also, does it allow writerow () kind of functionality ? Appreciate showing me the way here. Thanks.
... View more