Member since
02-15-2021
15
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6172 | 01-13-2022 06:50 AM | |
1653 | 02-17-2021 05:35 AM |
04-12-2022
06:47 AM
I have created an Apache Ni-Fi pipeline to transfer data from a MySQL database in an AWS RDS instance into a Timescale database (PostgreSQL) located in my local Linux Ubuntu instance running on VirtualBox on top of Windows 10. The target table has a composite primary key and has been created as follows: CREATE TABLE periodic_measurements (
time_stamp TIMESTAMP WITH TIME ZONE NOT NULL ,
original_db SMALLINT NULL ,
original_id BIGINT NULL ,
apartment_id INTEGER NOT NULL ,
room_id INTEGER NOT NULL ,
sensor_id INTEGER NOT NULL ,
metric SMALLINT NOT NULL ,
mvalue NUMERIC NULL ,
PRIMARY KEY (time_stamp, apartment_id, room_id, sensor_id, metric)); Here are some details about the ETL data pipeline: The MySQL source database has a timestamp column called period with datatype datetime, which therefore does not contain natively the time zone. However, the implementation uses Django framework which is configured to use time zone (USE_TZ=True). In the first processor of the Ni-Fi pipeline (QueryDatabaseTable) the column period is renamed time_stamp and formatted as follows: date_format(period, '%Y-%m-%d %H:%i:%s'). The output of the in Ni-Fi QueryDatabaseTable processor comes in AVRO format and the propriety Use Avro Logical Types is set to true. This means that the time_stamp column should be formatted as AVRO long / timestamp-millis. After a series of processors that use AVRO format (UpdateRecord, QueryRecord), the data goes to PutDatabaseRecord processor to be inserted into Timescale (PostgreSQL). In postgreSQL the time_stamp column has datatype timestamp with time zone, as seen above. When running the data pipeline, in the data provenance in Apache Ni-Fi I can see the following: Data in the output of QueryDatabaseTable (first processor) Obj avro.schemaæ {"type":"record","name":"periodic_measurements","namespace":"any.data","fields":[{"name":"original_id","type":["null","long"]},{"name":"time_stamp","type":["null","string"]},{"name":"apartment_id","type":["null","int"]},{"name":"room_id","type":["null","int"]},{"name":"sensor_id","type":["null","int"]},{"name":"metric","type":["null","int"]},{"name":"valueSum","type":["null","double"]},{"name":"valueDiv","type":["null","double"]}]} { "original_id" : 3341103630, "time_stamp" : "2019-12-08 03:20:00", "apartment_id" : 269, "room_id" : 1204, "sensor_id" : 2562, "metric" : 1, "valueSum" : 12267.1010819471, "valueDiv" : 600.0 }, Data in the input of PutDatabaseRecord (last processor) Obj avro.schemaΰ {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"time_stamp","type":["null","string"]},{"name":"apartment_id","type":["null","int"]},{"name":"room_id","type":["null","int"]},{"name":"sensor_id","type":["null","int"]},{"name":"metric","type":["null","int"]},{"name":"original_db","type":["null","string"]},{"name":"original_id","type":["null","long"]},{"name":"mvalue","type":["null","double"]}]} { "time_stamp" : "2019-08-21 06:30:00", "apartment_id" : 269, "room_id" : 1196, "sensor_id" : 2554, "metric" : 1, "original_db" : "1", "original_id" : 2347470787, "mvalue" : 23.014902551831668 } Only part of the data is copied to destination since I get the following kind of errors in Ni-Fi: PutDatabaseRecord[id=017e1008-dfd4-1d38-4b7c-6efa32ed611a] Failed to put Records to database for StandardFlowFileRecord[uuid=5ec613aa-b836-437e-a855-31907519e95b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1649245325204-268, container=default, section=268], offset=472323, length=49990],offset=0,name=6f15f3ef-79f4-43e0-83a9-ee0a2a65d106,size=49990]. Routing to failure.: Batch entry 341 INSERT INTO public.periodic_measurements (time_stamp, apartment_id, room_id, sensor_id, metric, original_db, original_id, mvalue) VALUES ('2019-03-31 04:00:00+03',269,0,4699,2,1,1226523146,94.19718804413162) was aborted: ERROR: duplicate key value violates unique constraint "1551_709_periodic_measurements_pkey" Detail: Key (time_stamp, apartment_id, room_id, sensor_id, metric)=(2019-03-31 04:00:00+03, 269, 0, 4699, 2) already exists. Call getNextException to see other errors in the batch Note that the duplicate key error occurs at the time of change into daylight saving time (the timezone is UTC+2 without daylight saving time). The most strange thing is that it happens when time is changed one hour forward (not backward). Also, in all the error cases, the composite key has always one field whose value is zero (it was NULL in the source database before the data transformation in the pipeline). When going to the target database, I get the following result: SELECT COUNT ( * ) FROM periodic_measurements WHERE time_stamp BETWEEN '2019-03-31 03:59:59+03' AND '2019-03-31 04:00:01+03' ;
count
-------
0
( 1 row ) I get the same error also when I use a normal postgreSQL database as destination (with no Timescale hypertable and compression). Note that by using another pipeline (Qlik Replicate), there is no duplicate key error, so the problem is somewhere in Ni-Fi. All this rises the following questions: Why in the data provenance the format of the field time_stamp is AVRO string rather than long / timestamp-millis as it should be based on the settings in the property Use Avro Logical Types set to true in QueryDatabaseTable processor? Why in the target database there is no entry with timestamp value 2019-03-31 04:00:00+03 if this value causes a duplicate key error? What is causing the error in Apache Ni-Fi? Any hints? Resetting the key manually as suggested at postgresql duplicate key violates unique constraint is not an option since the composite key is a natural key formed by timestamp and ids which have a meaning. Used SW versions: Linux Unbuntu: 18.04 Apache Ni-Fi: 1.15.2 PostgreSQL: 14.2 Timescale: 2.6.0 Thanks, Bernardo Di Chiara
... View more
Labels:
- Labels:
-
Apache NiFi
03-22-2022
01:36 AM
1 Kudo
Hi André! Thanks for your answer! That was the problem!
... View more
03-21-2022
09:58 AM
I have created a data pipeline with Apache Ni-Fi (version 1.15.2) to transfer data from a MySQL database in a AWS RDS instance into Timescale (postgreSQL). Ni-Fi runs in Linux Ubuntu 18.04 in Virtual Box on top of Windows 10. The pipeline has been tested successfully by using as a target the Timescale database located in my local Linux instance, without using SSL. Then I have modified the pipeline to use Timescale Cloud as a target (https://portal.timescale.cloud/login). The database name is: periodic-measurements In Timescale Cloud service page I can see these credentials: Host: <host-string> Port: 10250 User: tsdbadmin Password: <timescale-cloud-service-password> Service URI: postgres://tsdbadmin:<timescale-cloud-service-password>@<host-string>:10250/defaultdb?sslmode=require SSL mode: require Allowed IP addresses: 0.0.0.0/0 I use PutDatabaseRecord processor to write data in the postgreSQL database which uses the DBCPConnectionPool controller service. I have copied the CA certificate got from Timescale Cloud service page into the following directory in my Linux instance with the following file name: /etc/ca-certificates/ts-cloud-ca.pem These are the configuration parameters in the the DBCPConnectionPool controller service: • Database Connection URL: jdbc:postgres://<host-string>:10250/periodic-measurements • Database Driver Class Name: org.postgresql.Driver • Database Driver Location: /usr/share/java/postgresql/postgresql-42.3.1.jar • Database User: tsdbadmin • Password: <timescale-cloud-service-password> • sslmode: require • sslrootcert: /etc/ca-certificates/ts-cloud-ca.pem When enabling the controller service I get the following error: StandardControllerServiceNode[service=DBCPConnectionPool[id= 017e1005 - 818e-1423 - 7951 -125faf9dff4d], name=Connector to Timescale Cloud, active= true ] Failed to invoke @OnEnabled method due to java.sql.SQLException: No suitable driver
causes: org.apache.nifi.processor.exception.ProcessException: No suitable driver for the given Database Connection URL: No suitable driver for the given Database Connection URL I get the same result if I change the DBCPConnectionPool controller service settings with sslmode: verify-ca Note that I have managed to connect successfully to the Timescale Cloud database with Grafana from my Linux instance by using the same credentials. Any hint what could be wrong? Thanks, Bernardo
... View more
Labels:
- Labels:
-
Apache NiFi
01-13-2022
06:50 AM
I managed to fix the issue. It was related to the version of the jdbc driver. I did the following: In Linux browser go to https://dev.mysql.com/downloads/connector/j/ Select • Ubuntu Linux • 18.04 Download In a CLI terminal go to the Downloads directory $ cd / home / bernardo / Downloads Unpack the deb file $ sudo dpkg - i mysql - connector - java_8 .0 .26 -1 ubuntu18 .04 _all.deb Check where is the connector location in the file system $ dpkg - L mysql - connector - java | fgrep jar
/ usr / share / java / mysql - connector - java -8.0 .26 .jar Clean /usr/share/java from the previous jdbc driver $ sudo rm mysql - connector - java -5.1 .45 .jar
$ sudo rm - r com
$ sudo rm - r META - INF /
Install the connector: unpack the .jar file $ sudo jar xvf / usr / share / java / mysql - connector - java -8.0 .26 .jar To find the driver class name, open /usr/share/java/META-INF/services/java.sql.Driver $ cat / usr / share / java / META - INF / services / java.sql.Driver
com.mysql.cj.jdbc.Driver The driver class name is: com.mysql.jdbc.Driver Restart Apache Ni-Fi Modify the configuration of the controller service with: • Database Driver Class Name: com.mysql.cj.jdbc.Driver • Database Driver Location: /usr/share/java/mysql-connector-java-8.0.26.jar
... View more
01-12-2022
01:23 AM
Hi have corrected a typo in Ni-Fi controller service configuration: • Database Connection URL: jdbc:mysql://127.0.0.1:33061/<local_db_name> This way I solved the controller service problem and I have been able to enable it. However, when I run the processor QueryDatabaseTable, I get the following error: Cannot create PoolableConnectionFactory - Communications link failure QueryDatabaseTable[id=017e1003-c2d8-14cf-4e34-feee76411595] Unable to execute SQL select query SELECT * FROM periodic_measurements_test due to java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago.): javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate) ↳ causes: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago. ↳ causes: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago.) ↳ causes: org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago.) The processor configuration is default, except for: • Database Connection Pooling Service: < DBCPConnectionPool controller service name > • Database Type: MySQL • Table Name: periodic_measurements_test The problem might be due to the SSH tunnel but I cannot understand where the problem is. netstat - nplt
Active Internet connections ( only servers)
Proto Recv - Q Send - Q Local Address Foreign Address State PID / Program name
tcp 0 0 127.0 .0 .1 : 33061 0.0 .0 .0 : * LISTEN 29483 / ssh
tcp 0 0 127.0 .0 .1 : 3306 0.0 .0 .0 : * LISTEN -
tcp 0 0 127.0 .0 .1 : 8443 0.0 .0 .0 : * LISTEN - And in Linux terminal I can still access the MySQL database thorough CLI: $ mysql - u < db_user > - p - h 127.0 .0 .1 - P 33061
The controller service configuration is still: • Database Connection URL: jdbc:mysql://127.0.0.1:33061/<db_name> • Database Driver Class Name: com.mysql.jdbc.Driver • Database Driver Location: /usr/share/java/mysql-connector-java-5.1.45.jar • Database User: <db_user> • Password: <db_password>
... View more
01-12-2022
01:20 AM
I thought it could be an issue related to the driver version so I re-installed the jdbc driver. I have cleaned /home/bernardo/Download and /usr/share/java from the old sql related files. Then I have done the installation again. Install the jdbc connector $ sudo apt - get install libmysql - java The .jar file is in /usr/share/java/mysql-connector-java-5.1.45.jar Unpack the .jar file $ sudo jar xvf / usr / share / java / mysql - connector - java -5.1 .45 .jar To find the driver class name, open /usr/share/java/META-INF/services/java.sql.Driver $ cat / usr / share / java / META - INF / services / java.sql.Driver
com.mysql.jdbc.Driver
com.mysql.fabric.jdbc.FabricMySQLDriver The driver class name is: com.mysql.jdbc.Driver This is the updated configuration in the DBCPConnectionPool1.15.2 controller service in Apache Ni-Fi: • Database Driver Class Name: com.mysql.jdbc.Driver • Database Driver Location: /usr/share/java/mysql-connector-java-5.1.45.jar • Database Connection URL: jbdc:mysql://127.0.0.1:33061/<db_name> • Database User: <db_user> • Password: <db_password> I have obtained the same result: causes: org.apache.nifi.processor.exception.ProcessException: No suitable driver for the given Database Connection URL Then I checked if the problem was in the SSH tunnel. $ netstat - nplt
Active Internet connections ( only servers)
Proto Recv - Q Send - Q Local Address Foreign Address State PID / Program name
tcp 0 0 127.0 .0 .1 : 33061 0.0 .0 .0 : * LISTEN 4497 / ssh
tcp 0 0 127.0 .0 .1 : 3306 0.0 .0 .0 : * LISTEN -
tcp6 0 0 :: 1 : 33061 ::: * LISTEN 4497 / ssh
$ cat / proc / 4497 / status
Name: ssh
State: S (sleeping) The SSH process is in sleeping state. In a separate terminal I accessed the remote MySQL database: $ mysql - u < db_user > - p - h 127.0 .0 .1 - P 33061
Finally, I tried to connect to my local mysql database in my local Linux instance by using the following Ni-Fi settings: • Database Driver Class Name: com.mysql.jdbc.Driver • Database Driver Location: /usr/share/java/mysql-connector-java-5.1.45.jar • Database Connection URL: jbdc:mysql://127.0.0.1:3306/<local_db_name> • Database User: <local_db_user> • Password: <local_db_password> I have obtained the same result: causes: org.apache.nifi.processor.exception.ProcessException: No suitable driver for the given Database Connection URL Still, in a separate Linux terminal I am able to access the local mysql database : $ mysql - u < local_db_user > - p - h 127.0 .0 .1 - P 3306
... View more
01-05-2022
10:00 AM
The file mysql-connector-java-8.0.26.jar is in /usr/share/java Also: $ cat /usr/share/java/META-INF/services/java.sql.Driver com.mysql.cj.jdbc.Driver SO, I have made the following changes to the DBCPConnectionPool1.15.2 controller service configuration: • Database Driver Class Name: com.mysql.cj.jdbc.Driver • Database Driver Location: /usr/share/java/mysql-connector-java-8.0.26.jar The other parameters are not changed: • Database Connection URL: jbdc:mysql://127.0.0.1:33061/<db_name> • Database User: <user> • Password: <password> $ netstat -nplt Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:33061 0.0.0.0:* LISTEN 4873/ssh tcp6 0 0 ::1:33061 :::* LISTEN 4873/ssh Still I get an error: causes: org.apache.nifi.processor.exception.ProcessException: No suitable driver for the given Database Connection URL
... View more
01-04-2022
11:43 PM
My Linux is running on VirtualBox 6.1 on top of Windows 10 Pro.
... View more
01-04-2022
09:14 AM
Hi! I am trying to connect to a MySQL RDS instance in AWS with Apache NiFi on Linux Ubuntu. The MySQL database is behind a SSH tunnel which is working in a dedicated terminal. The MySQL database is accessible through another Linux terminal. $ ssh -i /home/bernardo/.ssh/sshbernardo3.pem -N -L 33061:<db_name>.<id>.eu-west-1.rds.amazonaws.com:3306 <user>@<ip_address> -p <port> I have uploaded and unzipped the JDBC driver from https://downloads.mysql.com/archives/c-j/ The file java.sql.Driver is located in the directory /usr/share/java/META-INF/services in Linux I have installed the jdbc connector in Linux and set up the user to use JDBC $ sudo apt-get install libmysql-java $ cd /home/bernardo $ gedit .bashrc Added the following lines at the end of the file: # Setting up the user to use JDBC CLASSPATH=$CLASSPATH:/usr/share/java/mysql-connector-java-8.0.26.jar export CLASSPATH $ source .bashrc In NiFi I have used a DBCPConnectionPool1.15.2 controller service which is configured as follows: • Database Connection URL: jbdc:mysql://127.0.0.1:33061/<db_name> • Database Driver Class Name: java.sql.Driver • Database Driver Location: /usr/share/java/META-INF/services • Database User: <user> • Password: <password> When enabling the controller service, I get the following error: StandardControllerServiceNode[service=DBCPConnectionPool[id=2440f624-017e-1000-9467-8bb12dda6ca6], name=<controller_service_name>, active=true] Failed to invoke @OnEnabled method due to java.lang.NoSuchMethodException: java.sql.Driver.<init>() causes: java.lang.InstantiationException: java.sql.Driver causes: org.apache.nifi.processor.exception.ProcessException: Creating driver instance is failed: Creating driver instance is failed Any hint? This is the SW I have installed: Ubuntu 18.04.6 openjdk version "11.0.13" 2021-10-19 javac 11.0.13 nifi-1.15.2 Thanks, Bernardo
... View more
Labels:
- Labels:
-
Apache NiFi
03-03-2021
10:36 PM
2 Kudos
I have managed to solve the issue by using the RouteText processor. I have added a propriety containing a regular expression that matches the rows I want to omit and I have configured the corresponding relationship to be terminated locally. Still, it would be nice to know how to implement the same functionality by using the ExtractText processor.
... View more