Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nifi convert from orc

avatar
Contributor

I am having trouble with converting files FROM .orc for further transformation in NiFi. Some of the .orc files are older, and from a source that disallows me from changing anything in the files. As a result, some of the files have corrupted information in the .orc files. I am able to overcome this obstacle by using orc-tools, a java application. I am able to accomplish converting the .orc files to JSON using orc-tools within NiFi via the use of an ExecuteStreamCommand processor, configured as follows:
'Command Path': orc-tools
'Command Arguments Strategy': 'Command Arguments Property'
'Command Arguments': 'meta;--recover;-d;-j;-p;${absolute.path}${filename}'

If I subsequently wire this to an 'ExtractText' processor, I can indeed see an appropriate JSON output, as expected.

The problem I am having is that I want to join the metadata attribute information in NiFi with the newly created JSON output, combining all to be sent to an Elasticsearch Index for subsequent querying, etc.

I cannot seem to find a way to access the results of the 'output stream' from ExecuteStreamCommand processor, and that is proving to provide significant consternation. Basically the first processor after ExecuteStreamCommand receives the 'output stream', as well as the metadata attribute information from NiFi. But any additional processors no longer have access to the 'output stream' information.

I think the most simple approach would be:

ExecuteStreamCommand --> QueryRecord --> PutElasticsearchHttp

However, I can devise a query that works in QueryRecord that will get me everything except the results of the 'output stream'. That's where I need help. I can't figure out what the field name would be.  This is the best query I have so far, but it only gives me the matadata attribute information, not the 'orc to json' information:
SELECT
uuid,
type,
path,
"mime.type" AS mime_type,
job_name,
hash,
filename,
"file.size" AS file_size,
"file.lastModifiedTime" AS file_last_modified_time,
ext
FROM
FLOWFILE

Any help would be very appreciated.

2 REPLIES 2

avatar
Community Manager

@arutkwccu Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @MattWho @cotopaul  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Contributor

No, I never received a reply. I was able solve the problem on my own, eventually.