Member since
01-07-2021
5
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
47 | 01-14-2021 12:53 PM | |
54 | 01-13-2021 01:55 PM |
01-14-2021
12:53 PM
Figured out an alternative way. I developed a Oracle PL/SQL function which takes table name as an argument, and produces a series of queries like "SELECT * FROM T1 OFFSET x ROWS FETCH NEXT 10000 ROWS ONLY". The number of queries is based on the number of rows of the table, which is a statistics number in the catalog table. If the table has 1M rows, and I want to have 100k rows in each batch, it will produces 10 queries. I use ExecuteSQLRecord to call this function, which effectively does the job of NiFi processor GenerateTableFetch. My next processor (e.g. ExecuteSQLRecord again) can now have 10 concurrent tasks working in parallel.
... View more
01-13-2021
03:30 PM
I use ExecuteSQLRecord to run a query and write to CSV format. The table has 10M rows. Although I can split the output into multiple flow files, the query is executed by only a single thread and is very slow. Is there a way to partition the query into multiple queries so that the next processor can run multiple concurrent tasks, each one process one partition? It would be like: GenerateTableFetch -> ExecuteSQLRecord (with concurrent tasks) The problem is that GenerateTableFetch only accepts table name as input. It does not accept customized queries. Please advise if you have solutions. I am new to NiFi. So I would like your details. Thank you in advance.
... View more
Labels:
01-13-2021
01:55 PM
1 Kudo
I figured out a workaround myself and hope it's useful for others. I use the following query to generate another query to be executed by the next step. This query converts Oracle date values to the preferred strings at the global level so it save the development effort at column level or table level. SELECT LISTAGG( CASE WHEN COLUMN_ID =1 THEN 'SELECT ' || CASE WHEN DATA_TYPE IN ('DATE','TIMESTAMP') THEN 'TO_CHAR(' || COLUMN_NAME || ',''YYYY-MM-DD HH24:MI:SS'') AS ' || COLUMN_NAME ELSE COLUMN_NAME END ELSE CASE WHEN DATA_TYPE IN ('DATE','TIMESTAMP') THEN 'TO_CHAR(' || COLUMN_NAME || ',''YYYY-MM-DD HH24:MI:SS'') AS ' || COLUMN_NAME ELSE COLUMN_NAME END END ,',') WITHIN GROUP (ORDER BY COLUMN_ID) || ' FROM ' || '${db.table.name}' AS MY_RECORD from user_tab_columns where table_name = '${db.table.name}' ;
... View more
01-07-2021
03:21 PM
I tried "java.arg.8=-Duser.timezone=America/New_York". It does not work for me. I posted one question earlier: https://stackoverflow.com/questions/65620632/why-do-executesqlrecord-and-csvrecordsetwriter-updated-the-time-zone-of-datetime
... View more
01-07-2021
12:20 PM
Hello! I am new to NiFi. I hope someone here can advise me about my problem with time zone. I have these processors: ListDatabaseTables -> GenerateTableFetch -> ExecuteSQLRecord (writing to csv file by CSVRecordWriter) -> ... ... PutSQL (loading csv file to MySQL using Load Data command) The source DB is Oracle. CSVRecordWriter has the following properties: Schema Write Strategy -> Do Not Write Schema Schema Access Strategy -> Inherit Record Schema Schema Name -> ${schema.name} Schema Text -> ${avro.schema} Date Format -> yyyy-MM-dd Time Format -> HH:mm:ss Timestamp Format -> yyyy-MM-dd HH:mm:ss My source DB and the target DB are both in US east time zone. However, I noticed that the output of ExecuteSQLRecord having time values converted to UTC (added to 5 hours). That results in the wrong time values in the target DB. There may be some ways to convert each date/time column individually, but that will require a huge amount of development effort. Is there a way to handle this issue properly at global level, or at least at table level? Please note that Time Format needs to be acceptable to MySQL Load Data. Thank you in advance! Gary
... View more
Labels: