About mburgess

mburgess · ‎03-09-2018

I think you can provide a core-site.xml that refers to a local filesystem as the default FS rather than an HDFS file system by setting the fs.default.name property to file:/// (see the answer on this SO post for more details).

mburgess · ‎03-07-2018

Will it always be the same ctl file? If so then you could use LookupAttribute to add attributes from the ctl file based on some key (like ID). Alternatively, you could read in a ctl file, extract the values to attributes, then set the filename attribute to 2.txt and use FetchFile to get 2.txt into the flow file's content (and the ctl attributes should remain in the flow file). Selective merging is IMO not a "natural" operation in a flow-based paradigm. It can be accomplished with something like ExecuteScript, but hopefully one of the above options would work better for your use case.

mburgess · ‎03-06-2018

The default filename is the timestamp when they were created, so since they are created quickly I'm not totally surprised they have the same filename. However you can use the flow file's UUID as the filename, that is guaranteed to be unique. You can set the filename with an UpdateAttribute processor, add a user-defined property with key "filename" and value "${UUID()}".

mburgess · ‎03-05-2018

What line gives that error? I tried the original script and just added the toString(), and it worked fine.

mburgess · ‎03-05-2018

Looks like you are referring to the script provided in this SO post. The variable createTable is a GString, not a Java String. This causes invocation of Sql.execute(GString), which converts the embedded expressions into parameters, and you can't use a parameter for a table name. Use the following instead: SQL.mydb.execute(createTable.toString()) This will cause the invocation of Sql.execute(String), which does not try to parameterize the statement.

mburgess · ‎03-05-2018

The underlying Hosebird library that GetTwitter uses to interface with the Twitter REST API does not yet support the extended tweets. Unfortunately, it also looks like the project is not very active so I'm not sure we'd see that capability added anytime soon.

mburgess · ‎03-05-2018

QueryDatabaseTable does not support incoming connections, so you wouldn't be able to support multiple tables. The "Table Name" property does support NiFi Expression Language, but that is so you can migrate flows from dev to test to production using different table names, each environment would have its own (static) variable set. Instead, you can use GenerateTableFetch, it supports incoming connections and thus you can use flow file attributes in the expression(s) for Table Name, Columns to Return, Maximum-value Columns, etc. It works like QueryDatabaseTable, but instead of generating and executing the SQL, it only generates the SQL statements. This allows you to send the statements downstream to something like ExecuteSQL, possibly distributing the flow files among nodes in the cluster (using a Remote Process Group -> Input Port, if you have a cluster vs a standalone NiFi instance). You can populate your incoming flow files from wherever you get your configuration (ListFile -> FetchFile if it is a file on disk, ListDatabaseTables if you want to get a list of tables from the database itself).

mburgess · ‎03-02-2018

The JoltTransformJSON processor accepts NiFi Expression Language in the spec, so you can do something like: [ { "operation": "default", "spec": { "newfield": "${my.attr}" } } ] And it will add "newfield" to the top level object, with a string value of whatever is in my.attr. Note that it (currently) has to be enclosed in quotes (and thus must be a string field), otherwise the spec validator will try to parse the EL itself and the processor will be marked invalid. This is a bug that will hopefully be fixed in an upcoming release.

mburgess · ‎03-01-2018

You can use PutDatabaseRecord instead of ReplaceText -> PutSQL, it will take the data itself and generate/execute the necessary SQL for your inserts. It also supports "Rollback on Failure" which should give you the error handling behavior you're looking for. You'll need to configure a RecordReader, and there isn't currently an XML RecordReader (but you can script one, see this template for an example), but however you're using ReplaceText to generate SQL, you could alternatively have it generate JSON or CSV and then configure a JsonTreeReader or CSVReader. The schema would look like the following (with your field names/types in place of the field1/field2 in the example: {"name": "myRecord", "type": "record", "fields": [ {"name": "field1", "type": ["null", "int"]}, {"name": "field2", "type": ["null", "string"]} ]} For a comprehensive example, see Andrew Lim's CDC with Apache NiFi series.

mburgess · ‎03-01-2018

No wait, that won't work. Hmm, it worked in the Jolt preview. Do you have an array of objects at the top level, vs the single object you have in your example?

Online	Offline
Last Visited	‎12-03-2025 12:10 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎12-03-2025 12:10 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Does PutParquet processor support writing to L...

Re: Combining Attributes in NiFi

Re: Read Configuration properties in QueryDatabase...

Re: Read a flowfile with groovy error

Re: Read a flowfile with groovy error

Re: How can I force the getTwitter processor to no...

Re: Read Configuration properties in QueryDatabase...

Re: NiFi - convert everything in json to attribute...

Re: Managing multiple insert queries

Re: Dynamically renaming JSON field names in NiFi