Member since
07-29-2020
574
Posts
320
Kudos Received
175
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
248 | 12-20-2024 05:49 AM | |
283 | 12-19-2024 08:33 PM | |
294 | 12-19-2024 06:48 AM | |
245 | 12-17-2024 12:56 PM | |
238 | 12-16-2024 04:38 AM |
06-18-2024
01:27 PM
Hi @MikeH , Have you tried adjusting the File Age Properties. My guess is that when a user drops thousands of files into their own folder it will take time to copy all of them ( depending how big the files are of course ) but lets say on average it takes minutes to copy those files , in this case you can set the Minimum File Age to be 2 minutes , then this will basically pull files that have been setting their for at least 2 minutes, so anything that recently being copied where the modified date is less than 2 minutes wont get picked. I know its not perfect but it will allow for some distribution without being stuck on folder with many files . The more you increase the minimum age the less files you will pick up so you can adjust accordingly. If that helps please make sure to accept solution. Thanks
... View more
06-14-2024
05:58 AM
Hi @Dave0x1 , Not sure if this is related but if you are using releases 2.0.0 M1\M2 and deploying python extensions please see this: https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Release-2-0-M1-amp-M2-High-CPU-Utilization-Issue/m-p/389020
... View more
06-14-2024
02:35 AM
3 Kudos
Hi, Usually when you get this error it means the certificate is not setup correctly to work with Nifi. For example if you are using wildcard certificate for all nodes then this is not supported by Nifi : https://docs.cloudera.com/cfm/2.0.4/nifi-toolkit-guide/topics/nifi-wildcard_certificates.html For more information about nifi certificate recommendations please see: TLS/SSL certificate requirements and recommendations | CDP Private Cloud (cloudera.com) IF you find this is helpful please accept solution. Thanks
... View more
06-14-2024
02:12 AM
1 Kudo
Hi @Thar11027 I stand corrected. Well, lets be more specific and you can't get more specific than looking the code itself in github :). It turns out the PutDatabaseRecord uses a DatabaseAdapter which is an interface type that gets implement by each Database Engine Type and passed through the DB service associated with this processor (DBCPConnectionPool). Those adapters are responsible for generating the SQL for each statement type (insert, update, delete....). For MySql there is an adapter called MySQLDatabaseAdapter and if you look at the genereateUpsertStatement method you will find that it uses the following syntax: StringBuilder statementStringBuilder = new StringBuilder("INSERT INTO ")
.append(table)
.append("(").append(columns).append(")")
.append(" VALUES ")
.append("(").append(parameterizedInsertValues).append(")")
.append(" ON DUPLICATE KEY UPDATE ")
.append(parameterizedUpdateValues);
return statementStringBuilder.toString(); Notice the use of "ON DUPLICATE KEY UPDATE" syntax. If you look for what that means in MySQL (https://blog.devart.com/mysql-upsert.html ) you will find that yes it will check if the record key exists or not , and if it does then it will do an update state however that only works on table Primary Key. In your case for the Transaction tale it works because as you mentioned the transaction_id is the primary key and you probably passing this column as part of the record data, however for the other table the id set to auto increment and probably you are not passing it as part of the record and instead relying on none primary key id_from_core. Not sure if its possible to change your table where this column is your primary key, otherwise you will find yourself having to do lookup to find if it exists or not and may be get the id then do your upsert with the id but Im not sure how this will work with Auto Increment being set. Another option which I tend to do in my case to avoid adding more processors\control services is to create stored proc that will defer all that checking for update or insert to sql then use PutSQL processor to execute the stored proc passing all columns to it but this can be cumbersome if you have so many columns which seem to be your case. What you can do to avoid passing each column is pass record as json string and do json parsing to find the column values in mySQL. Hope that helps.
... View more
06-13-2024
12:25 PM
Hi @MattWho , @pvillard I just found another bug in the 2.0.0 M3 release when I was trying to view the content of flowfile where it shows the following screen instead: When I check the log I found the following: 2024-06-13 15:10:13,718 ERROR [NiFi Web Server-536] o.a.nifi.web.ContentViewerController Content preparation failed for Content Viewer [/nifi-standard-content-viewer-2.0.0-M3]
com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 6]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2567)
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2593)
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2601)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:765)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3659)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2747)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:867)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:753)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4992)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4898)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3885)
at org.apache.nifi.web.StandardContentViewerController.doGet(StandardContentViewerController.java:90)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:527)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
at org.eclipse.jetty.ee10.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1379)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
at org.springframework.security.web.FilterChainProxy.lambda$doFilterInternal$3(FilterChainProxy.java:231)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:365)
at org.springframework.security.web.access.intercept.AuthorizationFilter.doFilter(AuthorizationFilter.java:100)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:58)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:233)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:186)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:352)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:268)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilterChain(DoSFilter.java:462)
at org.apache.nifi.web.server.filter.DataTransferExcludedDoSFilter.doFilterChain(DataTransferExcludedDoSFilter.java:51)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilter(DoSFilter.java:317)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilter(DoSFilter.java:282)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
at org.eclipse.jetty.ee10.servlet.Dispatcher.include(Dispatcher.java:154)
at org.apache.nifi.web.ContentViewerController.doGet(ContentViewerController.java:243)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:527)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
at org.eclipse.jetty.ee10.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1379)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
at org.springframework.security.web.FilterChainProxy.lambda$doFilterInternal$3(FilterChainProxy.java:231)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:365)
at org.springframework.security.web.access.intercept.AuthorizationFilter.doFilter(AuthorizationFilter.java:100)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.apache.nifi.web.security.log.AuthenticationUserFilter.doFilterInternal(AuthenticationUserFilter.java:57)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.oauth2.server.resource.web.authentication.BearerTokenAuthenticationFilter.doFilterInternal(BearerTokenAuthenticationFilter.java:145)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:94)
at org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:56)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.apache.nifi.web.security.csrf.CsrfCookieFilter.doFilterInternal(CsrfCookieFilter.java:43)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.csrf.CsrfFilter.doFilterInternal(CsrfFilter.java:117)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.apache.nifi.web.security.csrf.SkipReplicatedCsrfFilter.doFilterInternal(SkipReplicatedCsrfFilter.java:59)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:62)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:374)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:233)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:191)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:352)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:268)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:208)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilterChain(DoSFilter.java:462)
at org.apache.nifi.web.server.filter.DataTransferExcludedDoSFilter.doFilterChain(DataTransferExcludedDoSFilter.java:51)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilter(DoSFilter.java:317)
at org.eclipse.jetty.ee10.servlets.DoSFilter.doFilter(DoSFilter.java:282)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:90)
at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:75)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.apache.nifi.web.server.log.RequestAuthenticationFilter.doFilterInternal(RequestAuthenticationFilter.java:59)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:824)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:436)
at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:575)
at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:851)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:181)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:181)
at org.eclipse.jetty.server.Server.handle(Server.java:179)
at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:635)
at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:105)
at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2$1.run(HttpStreamOverHTTP2.java:133)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:211)
at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:158)
at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:449)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
at org.eclipse.jetty.io.ssl.SslConnection$SslEndPoint.onFillable(SslConnection.java:574)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:390)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:150)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
at java.base/java.lang.Thread.run(Thread.java:1583) It seems from the error that its expecting a formatted content but it was getting something else. In my flow I notice this happen when I do ReplaceText to change the content of flowfile from json to sql script, so I was able to replicate using the following flow: {
"type" : "record",
"namespace" : "Tutorialspoint",
"name" : "Employee",
"fields" : [
{ "name" : "Name" , "type" : "string" },
{ "name" : "Age" , "type" : "int" }
]
} exec DISC_INGEST.sp_IngestDisciplineTemplate
''
,'715B36EE-9D2F-43C7-A2D6-1044CAFCBF30_P2108.xlsx'
,'Test'
,'Estimating Template'
,'EstimatingTemplateTable'
,'' Click list queue from the ReplaceText success rel , then view content and you will get the error above ! As a workaround Im able to download the content successfully though. By the way can I submit these issues in github? it seems the developers are more engaged there , however I dont see an option to submit issues there. Thanks S
... View more
Labels:
- Labels:
-
Apache NiFi
06-13-2024
04:43 AM
2 Kudos
Hi @Thar11027 , I dont think there is an UPSERT statement in MySQL if Im not wrong. I think its treating it as regular insert and hence you are seeing duplicate entries. If you want to use PUTDatabaseRecord processor then you have to create two: one for insert and another for update and to decide which one you need to run you have to do Lookup to see if the customer with the same core id exists or not. For that you can use Lookup Record (refer to : https://community.cloudera.com/t5/Community-Articles/Data-flow-enrichment-with-NiFi-part-1-LookupRecord-processor/ta-p/246940 )processor to enrich your data with the customer core Id if exists, then you check if the record is found (meaning id exist) you route to Update otherwise you route to Insert. Hope that helps. If it does please accept solution. Thanks
... View more
06-12-2024
01:54 PM
Hi Initially I thought this was fixed in M3 and felt quite the relief for not having to manually rename the scripts folder to bin and restarting nifi twice for that: once to create the initial extension folders under the work folder, and second after renaming the scripts to bin. It turns out that it works for the very first time however upon subsequent restarts the python extension processors start going into invalid state. When I check the log I found that its still looking for the bin folder: 2024-06-12 15:58:22,105 ERROR [Initialize ExcelTableToJson] org.apache.nifi.NiFi An Unknown Error Occurred in Thread VirtualThread[#118,Initialize ExcelTableToJson]/runnable@ForkJoinPool-1-worker-2: java.lang.RuntimeException: Failed to launch Process for Python Processor [ExcelTableToJson] Version [2.0.0-SNAPSHOT]
java.lang.RuntimeException: Failed to launch Process for Python Processor [ExcelTableToJson] Version [2.0.0-SNAPSHOT]
at org.apache.nifi.py4j.StandardPythonBridge.getProcessForNextComponent(StandardPythonBridge.java:262)
at org.apache.nifi.py4j.StandardPythonBridge.createProcessorBridge(StandardPythonBridge.java:123)
at org.apache.nifi.py4j.StandardPythonBridge.lambda$createProcessor$7(StandardPythonBridge.java:140)
at org.apache.nifi.python.processor.PythonProcessorProxy.lambda$new$0(PythonProcessorProxy.java:78)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:309)
Caused by: java.io.IOException: Cannot run program "F:\nifi-2.0.0-M3-test\.\work\python\extensions\ExcelTableToJson\2.0.0-SNAPSHOT\bin\python": CreateProcess error=2, The system cannot find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1170)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1089)
at org.apache.nifi.py4j.PythonProcess.launchPythonProcess(PythonProcess.java:283)
at org.apache.nifi.py4j.PythonProcess.start(PythonProcess.java:129)
at org.apache.nifi.py4j.StandardPythonBridge.getProcessForNextComponent(StandardPythonBridge.java:243)
... 4 common frames omitted
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.base/java.lang.ProcessImpl.create(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:500)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:159)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1126)
... 8 common frames omitted
2024-06-12 15:58:22,109 ERROR [Initialize ExcelTableToJson] org.apache.nifi.NiFi An Unknown Error Occurred in Thread VirtualThread[#116,Initialize ExcelTableToJson]/runnable@ForkJoinPool-1-worker-7: java.lang.RuntimeException: Failed to launch Process for Python Processor [ExcelTableToJson] Version [2.0.0-SNAPSHOT]
java.lang.RuntimeException: Failed to launch Process for Python Processor [ExcelTableToJson] Version [2.0.0-SNAPSHOT]
at org.apache.nifi.py4j.StandardPythonBridge.getProcessForNextComponent(StandardPythonBridge.java:262)
at org.apache.nifi.py4j.StandardPythonBridge.createProcessorBridge(StandardPythonBridge.java:123)
at org.apache.nifi.py4j.StandardPythonBridge.lambda$createProcessor$7(StandardPythonBridge.java:140)
at org.apache.nifi.python.processor.PythonProcessorProxy.lambda$new$0(PythonProcessorProxy.java:78)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:309)
Caused by: java.io.IOException: Cannot run program "F:\nifi-2.0.0-M3-test\.\work\python\extensions\ExcelTableToJson\2.0.0-SNAPSHOT\bin\python": CreateProcess error=2, The system cannot find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1170)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1089)
at org.apache.nifi.py4j.PythonProcess.launchPythonProcess(PythonProcess.java:283)
at org.apache.nifi.py4j.PythonProcess.start(PythonProcess.java:129)
at org.apache.nifi.py4j.StandardPythonBridge.getProcessForNextComponent(StandardPythonBridge.java:243)
... 4 common frames omitted
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.base/java.lang.ProcessImpl.create(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:500)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:159)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1126)
... 8 common frames omitted When rename the scripts folder to bin and restart Nifi again its start working again! Also Im not sure why there is no separate python.log file anymore like in M1 & M2, instead everything related to python extension is getting logged into the main nifi-app.log which makes it harder to find. I have already posted a comment in github under pull request 12514 which seem to be related to this issue. @pvillard ***UPDATE***: I think I know why this is happening and under what circumstance. I notice that whenever this happens actually the bin folder is getting created upon downloading the dependencies. For example if you look at the log above I'm creating python processor that reads an excel table and converts into array of json records. This processor uses the following dependencies: ['pandas','numpy','openpyxl'] When they get downloaded a bin folder is getting created with the following file insde: f2py.exe I think this confuses Nifi and once its find the bin folder it thinks thats where venv files are therefore the processor goes into invalid state when it cant find them there. That is why I think its better not to hard code the folder path to look for one or the other and depending on what if finds first (which in this case the bin folder) it will start looking there. I think this is better if its left to be configurable depending on which evn Nifi is deployed on. Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
06-12-2024
09:03 AM
3 Kudos
Big shout out to @MattWho who is been incredibly helpful in this community. I have learned a lot from his answers and posts wither directly to issues I posted or through others. I dont think anyone can match the knowledge and the level of details he bosses when writing about Nifi.
... View more
06-12-2024
07:28 AM
2 Kudos
Dear @steven-matison , Let me say first you are AWESOME for going as far as you did to help me figure this thing out. I cant thank you enough. I think my lesson learned here is that I should not rely only the release notes to see if a major issue like this has been addressed and probably always review the bug fixes under Jira to help me decide wither an upgrade is worth it or not. However having said that , I still believe something as critical as memory leak and high CPU utilization should have been mentioned as part of the highlights: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version2.0.0-M3 I really wish I knew this thing earlier, it would have saved me days of troubleshooting and the stress of being under the watch of IT because they get a notification every time CPU utilization crosses certain threshold for some amount of time. I started testing M3 yesterday and yes I can confirm that so far that there is no issues with CPU utilization. I was about to announce to the team that an upgrade to M3 is needed. One thing Im not sure of is that I'm still seeing more Python processes in the taskmgr than what I have deployed: Not sure if this normal and if @MattWho or @pvillard have anything to say about this. Finally you also saved me is the time to trying to figure out why M3 is working and if this is a known issue or not so that I can post my results to the community in case someone else runs into this issue. God knows how much time I would have spent on this. Hopefully this post is enough to help others avoid the headache I had to go through. Thanks Dear @steven-matison , Let me say first you are AWESOME for going as far as you did to help me figure this thing out. I cant thank you enough. I think my lesson learned here is that I should not rely only the release notes to see if a major issue like this has been addressed and probably always review the bug fixes under Jira to help me decide wither an upgrade is worth it or not. However having said that , I still believe something as critical as memory leak and high CPU utilization should have been mentioned as part of the highlights: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version2.0.0-M3 I really wish I knew this thing earlier, it would have saved me days of troubleshooting and the stress of being under the watch of IT because they get a notification every time CPU utilization crosses certain threshold for some amount of time. I started testing M3 yesterday and yes I can confirm that so far that there is no issues with CPU utilization. I was about to announce to the team that an upgrade to M3 is needed. One thing Im not sure of is that I'm still seeing more Python processes in the taskmgr than what I have deployed: Not sure if this normal and if @MattWho or @pvillard have anything to say about this. Finally you also saved me is the time to trying to figure out why M3 is working and if this is a known issue or not so that I can post my results to the community in case someone else runs into this issue. God knows how much time I would have spent on this. Hopefully this post is enough to help others avoid the headache I had to go through. Thanks
... View more
06-12-2024
06:47 AM
Hi , Just a word of advice so you get better luck with your posts getting noticed and possibly someone to provide you with possible resolution: If you can shorten your json input next time to be isolated only to the problem that would be more helpful. You dont have to post the 99% that works with the 1% that doesnt as long as it doesnt affect the overall structure. For example if you have 50 fields you can just post 1-2 fields and if you have an array of 50 elements , 1-2 elements should be enough. Going to your problem , if you know that you will only have two amounts all the times as you specified, then you can intercept them in the first shift spec and assign the proper field names (Amount1, Amount2 ) as follows: [
{
"operation": "shift",
"spec": {
"*": {
"CustomFields": {
"*": {
"ViewName": {
"Amount": {
"@(2,Value)": "[&5].Amount1"
},
"*": {
"@(2,Value)": "[&5].&1"
}
}
}
},
"PortfolioSharing": {
"*": {
"@(0,PortfolioId)": "[&3].sharing_PortfolioId",
"@(0,CustomerId)": "[&3].sharing_CustomerId",
"@(0,PersonalId)": "[&3].sharing_PersonalId",
"@(0,ReasonId)": "[&3].sharing_ReasonId",
"@(0,ReasonProgId)": "[&3].sharing_ReasonProgId",
"@(0,ReasonName)": "[&3].sharing_ReasonName",
"@(0,Comment)": "[&3].sharing_Comment",
"@(0,TypeId)": "[&3].sharing_TypeId",
"@(0,TypeProgId)": "[&3].sharing_TypeProgId",
"@(0,TypeName)": "[&3].sharing_TypeName"
}
},
"Amount": "[&1].Amount2",
"*": "[&1].&"
}
}
}
,
{
"operation": "modify-default-beta",
"spec": {
"*": {
// trx_customer
"ResidentNonresident": "@(1,Resident/Non-resident)",
"NationalityCountryofIncorporation": "@(1,Nationality/CountryofIncorporation)",
"PermanantTownCity": "@(1,PermanantTown/City)",
"SubsidiaryAssociateofanotherorganization": "@(1,Subsidiary/Associateofanotherorganization)",
"Howdidyougettoknowaboutus": "@(1,Howdidyougettoknowaboutus?)",
// trx_portfolio
"UseBankAccountFromCustomer": "@(1,UseBankAccountFromCustomer?)"
}
}
},
{
"operation": "remove",
"spec": {
"*": {
// trx_customer
"Resident/Non-resident": "",
"Nationality/CountryofIncorporation": "",
"PermanantTown/City": "",
"Subsidiary/Associateofanotherorganization": "",
"NatureOf_Business": "",
"Howdidyougettoknowaboutus?": "",
//trx_portfolio
"UseBankAccountFromCustomer?": ""
}
}
} /**/
] Hope that solve your problem. If you found this is helpful, please accept the solution. Thanks
... View more