Member since
01-27-2023
229
Posts
74
Kudos Received
45
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1775 | 02-23-2024 01:14 AM | |
| 2311 | 01-26-2024 01:31 AM | |
| 1441 | 11-22-2023 12:28 AM | |
| 3597 | 11-22-2023 12:10 AM | |
| 3682 | 11-06-2023 12:44 AM |
07-10-2023
11:59 PM
@Sivaluxan, I am not quite sure you have the correct Database Driver Class Name. I am extracting data our of BigQuery using the combination of GenerateTableFetch and ExecuteSQLRecord and I receive no error message at all. In terms of configurations I have the following: Database Connection URL: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=your-project-if-here;OAuthType=0;OAuthServiceAcctEmail=your-email-for-service-account-here;OAuthPvtKeyPath=path_on_nifi_server_where_the_service_account_json_is_located; Database Driver Class Name: com.simba.googlebigquery.jdbc.Driver Database Driver Location: full_path_to_jars_location/
... View more
07-10-2023
08:44 AM
@steven-matisonthanks for you answer:) You can download a template here: download Instead of GenerateFlowFile, I have another processing section, but nevertheless, the relevant part starts with AttributesToJson going up until the PutBigQuery Processors 🙂
... View more
07-10-2023
12:15 AM
@stevenmatison, @MattWho, @SAMSAL : have you ever encountered such a behavior? 😁
... View more
07-07-2023
07:05 AM
Hi guys, Please help me out with a strange behavior when using PutBigQuery. I am using Apache NiFi 1.19.1 So, my flow is as follows: Step a: I have a GenerateTableFetch and an ExecuteSQLRecord, which extracts some data out of Database. Step b: The data gets loaded into a GCS Bucket, using PutGCSOBject. Step c: When the data has been saved into the GCS Bucket, I have an UpdateAttribute Processor, linked to the success queue. Within this UpdateAttribute Processor, I have defined the following 3 attributes: TABLE_NAME = ${generatetablefetch.tableName:toUpper()} EXECUTION_DATE = ${now():toNumber()} MESSAGE = 1 Step d: The success queue is linked afterwards to an AttributesToJSON Processor. I have modified the properties as follows: Destination = flowfile-content Attributes List = TABLE_NAME, EXECUTION_DATE, MESSAGE Step e: Via success, I link to an ConvertRecord, where I change from JSON to AVRO. The JSON Reader and the AVRO Writter are both defined with the following schema: { "namespace": "example.avro", "type": "record", "name": "DOMAIN.LOGGING_STATUS_EXECUTION", "fields": [ { "name": "TABLE_NAME", "type": "string" }, { "name": "EXECUTION_DATE", "type": [ "null", { "type": "long", "logicalType": "local-timestamp-millis" } ] }, { "name": "MESSAGE", "type": "int" } ] } Step f: First test would be with PutBigQueryBatch. I have defined my Dataset, my Table Name, Load File Type = AVRO, Create Disposition = CREATE_IF_NEEDED and Write Disposition = WRITE_APPEND. When executing the processor on the AVRO File (from step e), the data gets loaded correctly into my BigQuery Table. My second test would be with PutBigQuery. I have defined my Dataset, my Table Name, the Record Reader as an AVRO Reader using the embedded AVRO Schema and Transfer Type = BATCH. When executing the processor on the AVRO File (from step e), the data gets loaded into my BigQuery Table, but all the values are NULL ... and no matter how much I wait, it remains NULL. Here is a screenshot of how the data looks, in the same table, where row 1 = PutBigQuery and row 2 = PutBigQueryBatch, using the same flow on the same data. The table has the following column-data types and it has not partitioning. TABLE_NAME = STRING EXECUTION_DATE = DATETIME MESSAGE = INTEGER Has anybody else experienced this behavior and if yes, how did you solve it? Thank you 🙂
... View more
Labels:
- Labels:
-
Apache NiFi
07-05-2023
02:03 AM
1 Kudo
@c3turner7, For 1: that is actually not an error and you might ignore it. Does it get written with ERROR because as far as I know, that line is an INFO and it causes no issue/problem. For 2: based on my tests, you can ignore that message in windows as I received it constantly and NiFi works without any issue. For 3: Try recording your screen and see if any error gets printed in the CMD before being closed. When this happend to me, I had an issue with JAVA and it got printed in the logs or within the CMD window. Nevertheless, what CMDs are you trying to execute and why? They all have a purpose and if you start running them in blind you might cause other problems and make your debug much heavier than it already is. For 4: those are INFO lines, meaning that they are not affecting your application and you can ignore them. Now, circling back to your problem, what do you mean when saying that you do not pass the loading screen? Are you staring at a blank page or do you have the logo displayed constantly? A screenshot might help. In addition, I strongly suggest you to extract all the ERROR lines from your log (nifi-app.log and nifi-bootstrap.log) and paste them here. How did you configure nifi-properties and bootstrap.conf?
... View more
06-26-2023
12:02 AM
@Carson, Like @joseomjr wrote (but did not gave the entire link by mistake), you should take a look at the following Article as it describes exactly what you need --> https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 Basically, you add the property in your NiFi Processor and reference it to your parameter value and afterwards you call it in your script: myValue1 = myProperty1.getValue()
... View more
06-25-2023
11:53 PM
@Wpq, I do not think that NiFi supports the full syntax of Java regular expressions directly, like you are trying to do, especially on attributes and using NiFi's Expression Language. What I would recommend you to try is to use NiFi's EL to replace that entire regex, something like: ${ip:startsWith('10.') or ip:startsWith('127.') or ip:startsWith('169.254.') or (ip:startsWith('172.') and ip:substring(4, 6):matches('1[6-9]|2[0-9]|3[0-1]')) or ip:startsWith('192.168.')} The above code is not 100% correct, I am more than certain, but you can extrapolate from that example and rewrite your query somehow like that. The NiFi EL functions you should use are: OR: ${ fileSize:lt(64):or( ${fileSize:gt(128)} )} StartsWith: ${ filename:startsWith('fizz') } Matches: ${ filename:matches('fizz.*txt') } And: ${ fileSize:gt(64):and( ${fileSize:lt(128)} )}
... View more
06-21-2023
07:46 AM
@drewski7, in this case, have a look at @steven-matison 's answer because that is your solution to your problem.
... View more
06-21-2023
05:58 AM
@drewski7, UpdateRecord works as fast as you design it to work 🙂 For example, using UpdateRecord, I manage to generate 6 columns on a FlowFile, with more than 200k lines, in less than 7 seconds. For AVRO Files of 100MB, doing pretty much the same action will take around 15-20 seconds. If you are using UpdateRecord to generate 100x columns and each of this columns is using a lookup to check something else or if it uses may functions on multiple columns, it is normal that it will take a long time to process. Besides that, if you are using the UpdateRecord on flowfiles with millions of rows, again, it will take longer to process. So, in order to make your flow faster, you first need to identify where is the bottleneck. First things first, check the type of the file you are reading and the type of the file you are writing into. Each type has its pluses and minuses. Next, I suggest you to take a look on the number of rows in each flowfile --> processing 1M rows is slower than processing 500k rows. Afterwards, you should further check the functions you are applying in UpdateRecord and see if you can optimize them in any way.
... View more
06-20-2023
09:32 AM
sorry but I am not understanding what you are trying to do. First of all, GenerateFlowFile does not accept incoming connections, meaning that you cannot use it in your flow, especially if you are dependent by some other actions. What are you trying to do with GenerateFlowFile exactly? We are trying make it as one flow file, with what ever we loaded on the per/daily load. So you are executing your flow once per day at a specific hour, or? How do you know what should be added in your flow file? What exactly are you trying to achieve? Are you extracting something from your database? Unfortunately you have described your use case very vaguely. If you do require assistance, I strongly recommend you to provide a more detailed description of what you are doing, what you are trying to achieve, what you tried and why it failed. If coming back to your original post, once you saved your data in your database using PutDatabaseRecord, you can further go with a success queue in your next processor and do whatever you need. In this way, once the data is saved in your database, you can call your stored procedure as expected.
... View more