Created 11-03-2023 04:57 PM
I am having trouble with converting files FROM .orc for further transformation in NiFi. Some of the .orc files are older, and from a source that disallows me from changing anything in the files. As a result, some of the files have corrupted information in the .orc files. I am able to overcome this obstacle by using orc-tools, a java application. I am able to accomplish converting the .orc files to JSON using orc-tools within NiFi via the use of an ExecuteStreamCommand processor, configured as follows:
'Command Path': orc-tools
'Command Arguments Strategy': 'Command Arguments Property'
'Command Arguments': 'meta;--recover;-d;-j;-p;${absolute.path}${filename}'
If I subsequently wire this to an 'ExtractText' processor, I can indeed see an appropriate JSON output, as expected.
The problem I am having is that I want to join the metadata attribute information in NiFi with the newly created JSON output, combining all to be sent to an Elasticsearch Index for subsequent querying, etc.
I cannot seem to find a way to access the results of the 'output stream' from ExecuteStreamCommand processor, and that is proving to provide significant consternation. Basically the first processor after ExecuteStreamCommand receives the 'output stream', as well as the metadata attribute information from NiFi. But any additional processors no longer have access to the 'output stream' information.
I think the most simple approach would be:
ExecuteStreamCommand --> QueryRecord --> PutElasticsearchHttp
However, I can devise a query that works in QueryRecord that will get me everything except the results of the 'output stream'. That's where I need help. I can't figure out what the field name would be. This is the best query I have so far, but it only gives me the matadata attribute information, not the 'orc to json' information:
SELECT
uuid,
type,
path,
"mime.type" AS mime_type,
job_name,
hash,
filename,
"file.size" AS file_size,
"file.lastModifiedTime" AS file_last_modified_time,
ext
FROM
FLOWFILE
Any help would be very appreciated.
Created 11-03-2023 06:02 PM
@arutkwccu Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @MattWho @cotopaul who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 01-02-2024 08:20 AM
No, I never received a reply. I was able solve the problem on my own, eventually.