About mburgess

mburgess · ‎03-15-2017

Yes it is possible with ExecuteScript if nothing else. Try the following Groovy script in your ExecuteScript processor: def flowFile = session.get() if(!flowFile) return class WriteCallback implements OutputStreamCallback { Map attrs WriteCallback(attributes) { attrs = attributes } void process(OutputStream outputStream) { outputStream.write('<root>\n'.bytes) attrs.each {k,v -> outputStream.write("<property>\n\t<name>$k</name>\n\t<value>$v</value>\n".bytes) } outputStream.write('</root>'.bytes) } } def wb = new WriteCallback(flowFile.attributes) flowFile = session.write(flowFile, wb) flowFile = session.putAttribute(flowFile, org.apache.nifi.flowfile.attributes.CoreAttributes.MIME_TYPE.key(), 'application/xml') session.transfer(flowFile, REL_SUCCESS) This should pretty-print your attributes in a "properties-style" XML format. Of course you can edit the script to give you the schema you like.

mburgess · ‎03-15-2017

In addition to QueryDatabaseTable, you may be interested in the GenerateTableFetch processor. It is similar to QueryDatabaseTable except that it does not execute SQL queries, it generates them and sends out flow files with SQL queries. This allows you to distribute to the fetching in parallel over a NiFi cluster. In an upcoming release, GenerateTableFetch will accept incoming flow files, so you could enhance the workflow with the ListDatabaseTables processor, sending those tables to GenerateTableFetch, thus parallelizing the fetching of multiple pages of multiple tables.

mburgess · ‎03-15-2017

Currently NiFi does not support XLS as a format, but there has been a community contribution to add a ConvertExcelToCSV processor under NIFI-2613.

mburgess · ‎03-15-2017

For CSV files, if you know the number and type of column values, you can use SplitText (to get one row per flow file) followed by ExtractText, supplying a regular expression to get the column values out into flow file attributes. Then you can use ReplaceText to manually enter a SQL INSERT statement (using NiFi Expression Language to access the attributes). For other formats like Avro, as we don't currently have a ConvertAvroToSQL processor, you'd have to convert them for now. Work is underway for a generic system of type conversions, such that you could specify Avro as your input format and perhaps "SQL INSERT" as your output format, thereby effectively making the generic processor work like a ConvertAvroToSQL processor.

mburgess · ‎03-15-2017

Is your processor in its own NAR, or have you added it to a NiFi NAR (such as the nifi-hive-bundle or nifi-hdfs-bundle)? If the former, have you added the nifi-hadoop-libraries NAR as a parent to your NAR? This will give you access to the Hadoop JARs/classes via a parent classloader. To add this NAR as a parent, add the following to the <dependencies> section in your custom processor's NAR module (not the processor module itself): <dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-hadoop-libraries-nar</artifactId> <type>nar</type> </dependency> Can you describe your use case a little more? If your files are already in ORC format, you should be able to use PutHDFS to place them onto the Hadoop cluster. If they are in some other format, you might be able to use some conversion processors (including ConvertAvroToORC) and then PutHDFS to land the resultant ORC files into the cluster.

mburgess · ‎03-15-2017

According to this, Salesforce requires authentication via OAuth 2.0. Please see this HCC article for a discussion (and related links) on how to achieve this. The question is about the GetHttp processor but should apply to InvokeHttp as well.

mburgess · ‎03-13-2017

It does convert Avro to ORC, but ironically it does so by converting the Avro to JSON first. We should allow ORC (if prudent, possibly after a ConvertAvroToORC processor) and JSON as valid input formats (perhaps via the mime.type attribute?) to PutHiveStreaming. Do you mind writing up a Jira for this? Please and thank you 🙂

mburgess · ‎03-10-2017

Try forming your Database Driver Location value as a URL, such as "file://C/path/to/sqljdbc4.jar" I think folks have had trouble with Windows pathnames. For the "dynamic" Database Driver Location value: that property accepts NiFi Expression Language. Of course there's no flow file available from which to grab attributes, but you can use the NiFi Variable Registry to set the value of that (and other) properties. This would allow you to change the values in one place, and/or to have different values for different environments (dev, test, production, e.g.). Is that what you mean by not hard-coding the location?

mburgess · ‎03-08-2017

I'm not familiar enough with Sqoop to know if they have any options that don't involve a max-value column. As you point out, if there's no way from a row to tell if it is "new", then you have to check the whole table.

mburgess · ‎03-08-2017

Is there anything else in the logs after that, perhaps a "Caused by" section?

Online	Offline
Last Visited	‎10-29-2025 10:31 AM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 10:31 AM
Posts	905
Kudos received	658

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Attributes to XML (not JSON)

Re: Change Data Capture using NiFi

Re: Is there any support of xls files in nifi?

Re: I want to insert data into the database withou...

Re: NiFi - custom orc processor gives NoClassDefFo...

Re: How to authenticate NiFi processor on SFDC RES...

Re: NiFi PutHiveStreaming requires Avro?

Re: Custom processor + DBCPConnectionPool for SQL ...

Re: Incremental data loads from relational databas...

Re: Unable to instantiate org.apache.hive.hcatalog...