Member since
06-19-2017
62
Posts
1
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
646 | 03-17-2022 10:37 AM | |
480 | 12-10-2021 04:25 AM | |
811 | 08-18-2021 02:20 PM | |
1623 | 07-16-2021 08:41 AM | |
467 | 07-13-2021 07:03 AM |
02-02-2023
08:55 AM
Hello All, We have a Spark program which executes multiple queries and the tables are Hive tables. Currently the queries are executed using Tez engine from Spark. I set it sqlContext.sql("SET hive.execution.engine=spark") in the program and understand that the queries/program would run as Spark. We are using HDP 2.6.5 version and Spark 2.3.0 version in our cluster. Can someone suggest that it is the correct way as we do not need to run the queries using Tez engine and Spark should run as it is . In the config file /etc/spark2/conf/hive-site.xml, we do not have any specific engine property setup and we do have only kerberos, metastore property details. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
07-12-2022
03:08 PM
@Chakkara As far as i remember , distributed cache is not having consistency . You could use Hbase or HDFS for storing the status of success or failure of the processors for downstream application. Once you saved the Success and Failure at Hbase . You can retrieve it from the Hbase processor using the row ID. Build a REST API NiFi flow to pull the status from Hbase for example HandleHTTPRequest --> FetchHbaseRow - HandleHTTPResponse You can call the HTTP API (Request and Response) via shell script/curl and call the script from Control-M.
... View more
03-18-2022
12:08 AM
1 Kudo
Hi @VR46 Thanks for the analysis. As It did not work initially in Python requests ,then i tried to check in Java Sprintboot Rest template and it started the Oozie workflow . I found the difference that Base64 package was needed for encoding the username and password .The same I applied in Python and finally it worked .
... View more
03-17-2022
10:37 AM
Hi @VR46 , I did solve the issue by encoding the username and password using base64 module and passed to post request . It worked
... View more
12-10-2021
04:25 AM
Hi , Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path. 1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder. Properties to mention in ListSFTP processor are highlighted below 2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path. Properties to configure in FetchSFTP processor. 3. In PUTHDFS processor , please configure the highlighted values of your project and required folder. If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi. 4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.
... View more
11-24-2021
11:58 AM
Hi, Currently we are using NiFi to pull the files from SFTP server and put into HDFS using NiFi listSFTP processor and FetchSTP processor and we can track the status of the flow whether it is success or failure and the ingestion status can be stored in persistent storage for example hbase or hdfs.We can query the status of the Ingestion anytime . Another option we did in previous projects that we pulled the files from NAS storage to local file system(edge node or gateway node) then to Hadoop using SFTP copy unix command to put the file to HDFS using hdfs commands. The process has been done in shell script .Scheduled through control-M. Between what you mean by all flow in same place ? We can develop single NiFi flow to pull the files from SFTP server and put into Hadoop files system target path.
... View more
11-22-2021
01:46 AM
Hi , Since there is no json structure mentioned in the question , Could you please check the JOLT nifi processor . Refer the jolt specification . https://stackoverflow.com/questions/61697356/extract-first-element-array-with-jolt Using Jolt nifi processor , we can perform many transformations from json file. Thanks
... View more
11-13-2021
04:03 PM
Hi, I am receiving the same error when using beeline commands in Oozie shell action that one nodes connected and another node was failed to connect ..But same beeline command works through edge node or gateway mode .Beeline does not work via Oozie shell action ..Any specific reason for that?
... View more
10-23-2021
12:23 AM
Hi , Could you please check the user is having the permission to trigger Oozie in Ranger policy (OR) also please check your Oozie workflow xml file is present in HDFS path once .. Normal Basic Auth is fine for accessing Oozie REST APIs ..I am able to perform POST and GET request of Oozie workflow successfully and monitor the status of the workflow in the same script .
... View more
10-14-2021
02:02 AM
Hi @COE , Thank you for the confirmation . Yes. I mentioned one of the working drop partition query in the post. We were in situation to use the functions inside drop partition clause . We will adopt the 14days calculation in script and pass the value to DROP partition statement.
... View more
10-13-2021
05:47 AM
Hi @arunek95 Yes,the workaround has been applied by following the community posts. As of now .we don't have any root-cause why many files were in OPENFORWRITE state for particular two days in our cluster. https://community.cloudera.com/t5/Support-Questions/Cannot-obtain-block-length-for-LocatedBlock/td-p/117517 Thanks
... View more
10-13-2021
05:41 AM
Hello All, I am trying to drop the partitions less than 14 days ago in Hive and hive query below is throwing error. Hive query: ALTER TABLE audit_logs DROP PARTITION (evt_date < 'date_format(date_sub(from_unixtime(unix_timestamp('20211013','yyyyMMdd'),'yyyy-MM-dd'),14),'yyyyMMdd')'); We have partitioned evt_date as string date type and evt_date is stored in this date format yyyyMMdd i.e 20211013 as string in the table . Error : Error while compiling statement: FAILED: ParseException line 4:118 cannot recognize input near ''date_format(date_sub(from_unixtime(unix_timestamp('' '20211013' '','' in constant Input to the unix_timestamp('20211013','yyyyMMdd') method also comes as yyyyMMdd in string format from another oozie parameter/another program. individual select query select date_format(date_sub(from_unixtime(unix_timestamp('20211013','yyyyMMdd'),'yyyy-MM-dd'),14),'yyyyMMdd'); Output : 20210929 ALTER TABLE audit_logs DROP PARTITION (evt_date < '20211029'); - Hive query works fine . Could someone assist on this. Thanks @hive
... View more
Labels:
- Labels:
-
Apache Hive
10-03-2021
05:26 AM
Hi All, I am running distcp command which copies all the audit logs HDFS folder to another HDFS folder for further processing purpose . The distcp command used to work fine till 2 weeks ago and started failing since last week .I checked detailed MR logs and understand that only particular file copy failed and other folder/files of audit logs like kafka,hive,nifi and hbase are copied . some specific files copy processing is failing. distcp command : hadoop distcp -filters $filter_file_loc ranger/audit /data/audit_logs/staging Distribution : Cloudera Data Platform version 7.1.7 Please find the detail error messages . java.io.IOException: File copy failed: hdfs://namenode/ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log --> hdfs://namenode /data/audit_logs/staging /audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1024772623-10.107.146.29-1593441936031:blk_1183449574_109711397; getBlockSize()=64553182; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.107.145.208:9866,DS-b11e932b-0460-47b7-a281-3743ecf9c581,DISK]]} of /ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:370)
at org.apache.hadoop.hdfs.DFSInputStream.getLastBlockLength(DFSInputStream.java:279)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:260)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:203)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:187)
at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1056)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1019)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:351)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:954)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:331) @distcp
... View more
Labels:
- Labels:
-
Apache Hadoop
-
MapReduce
09-23-2021
03:20 AM
Hi All, I am able to launch the Oozie work flow using Oozie Webservice API via CURL utility .Like about perform HTTP GET and POST requests . Example curl command: curl -u guestuser:guestpwd -X POST -H "Content-Type: application/xml;charset=UTF-8" -d @Job.xml "https://knowxhost:8443/gateway/cdp-proxy-api/oozie/v1/jobs?action=start" url: https://knowxhost:8443/gateway/cdp-proxy-api/oozie/v1/job/0000009-210910123241282-oozie-oozi-W' Also in python , I can fetch the Oozie executed workflow. response = requests.get(url, auth=HTTPBasicAuth('guestuser','guestpwd '), verify=False).text But unable to perform the POST request and getting 500 Error . url = ' https://knowxhost:8443/gateway/cdp-proxy-api/oozie/v1/jobs?action=start' response = requests.post(url, auth=HTTPBasicAuth('guestuser','guestpwd'), verify=False, data=xml, headers=headers).text Exception: HTTP ERROR 500 javax.servlet.ServletException: javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. URI:STATUS:MESSAGE:SERVLET:CAUSED BY:CAUSED BY:CAUSED BY:CAUSED BY:CAUSED BY:CAUSED BY: /gateway/cdp-proxy-api/oozie/v1/jobs 500 javax.servlet.ServletException: javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. cdp-proxy-api-knox-gateway-servlet javax.servlet.ServletException: javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error. java.io.IOException: java.io.IOException: Service connectivity error. java.io.IOException: Service connectivity error. Detailed stackTrace: GatewayFilter.java:doFilter(169)) - Gateway processing failed: javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error.
javax.servlet.ServletException: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error.
at org.apache.shiro.web.servlet.AdviceFilter.cleanup(AdviceFilter.java:196)
at org.apache.shiro.web.filter.authc.AuthenticatingFilter.cleanup(AuthenticatingFilter.java:155)
at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:148)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:450)
at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.filter.ResponseCookieFilter.doFilter(ResponseCookieFilter.java:49)
at org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:58)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.filter.XForwardedHeaderFilter.doFilter(XForwardedHeaderFilter.java:50)
at org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:58)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.GatewayFilter.doFilter(GatewayFilter.java:167)
at org.apache.knox.gateway.GatewayFilter.doFilter(GatewayFilter.java:92)
at org.apache.knox.gateway.GatewayServlet.service(GatewayServlet.java:135)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791)
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:228)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.apache.knox.gateway.trace.TraceHandler.handle(TraceHandler.java:51)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.apache.knox.gateway.filter.CorrelationHandler.handle(CorrelationHandler.java:41)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.apache.knox.gateway.filter.PortMappingHelperHandler.handle(PortMappingHelperHandler.java:106)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.websocket.server.WebSocketHandler.handle(WebSocketHandler.java:115)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.shiro.subject.ExecutionException: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error.
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:389)
at org.apache.knox.gateway.filter.ShiroSubjectIdentityAdapter.doFilter(ShiroSubjectIdentityAdapter.java:71)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
... 77 more
Caused by: java.security.PrivilegedActionException: java.io.IOException: java.io.IOException: Service connectivity error.
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.knox.gateway.filter.ShiroSubjectIdentityAdapter$CallableChain.call(ShiroSubjectIdentityAdapter.java:142)
at org.apache.knox.gateway.filter.ShiroSubjectIdentityAdapter$CallableChain.call(ShiroSubjectIdentityAdapter.java:74)
at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
... 83 more
Caused by: java.io.IOException: java.io.IOException: Service connectivity error.
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.failoverRequest(ConfigurableHADispatch.java:269)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequest(ConfigurableHADispatch.java:163)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.failoverRequest(ConfigurableHADispatch.java:263)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequest(ConfigurableHADispatch.java:163)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.failoverRequest(ConfigurableHADispatch.java:263)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequest(ConfigurableHADispatch.java:163)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.failoverRequest(ConfigurableHADispatch.java:263)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequest(ConfigurableHADispatch.java:163)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequestWrapper(ConfigurableHADispatch.java:125)
at org.apache.knox.gateway.dispatch.DefaultDispatch.doPost(DefaultDispatch.java:343)
at org.apache.knox.gateway.dispatch.GatewayDispatchFilter$PostAdapter.doMethod(GatewayDispatchFilter.java:182)
at org.apache.knox.gateway.dispatch.GatewayDispatchFilter.doFilter(GatewayDispatchFilter.java:125)
at org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:58)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.ranger.authorization.knox.RangerPDPKnoxFilter.doFilter(RangerPDPKnoxFilter.java:171)
at org.apache.ranger.authorization.knox.RangerPDPKnoxFilter.doFilter(RangerPDPKnoxFilter.java:110)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.identityasserter.common.filter.AbstractIdentityAssertionFilter.doFilterInternal(AbstractIdentityAssertionFilter.java:193)
at org.apache.knox.gateway.identityasserter.common.filter.AbstractIdentityAssertionFilter.access$000(AbstractIdentityAssertionFilter.java:53)
at org.apache.knox.gateway.identityasserter.common.filter.AbstractIdentityAssertionFilter$1.run(AbstractIdentityAssertionFilter.java:161)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.knox.gateway.identityasserter.common.filter.AbstractIdentityAssertionFilter.doAs(AbstractIdentityAssertionFilter.java:156)
at org.apache.knox.gateway.identityasserter.common.filter.AbstractIdentityAssertionFilter.continueChainAsPrincipal(AbstractIdentityAssertionFilter.java:146)
at org.apache.knox.gateway.identityasserter.common.filter.CommonIdentityAssertionFilter.doFilter(CommonIdentityAssertionFilter.java:94)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.filter.rewrite.api.UrlRewriteServletFilter.doFilter(UrlRewriteServletFilter.java:57)
at org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:58)
at org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:349)
at org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:263)
at org.apache.knox.gateway.filter.ShiroSubjectIdentityAdapter$CallableChain$1.run(ShiroSubjectIdentityAdapter.java:90)
at org.apache.knox.gateway.filter.ShiroSubjectIdentityAdapter$CallableChain$1.run(ShiroSubjectIdentityAdapter.java:87)
... 90 more
Caused by: java.io.IOException: Service connectivity error.
at org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:188)
at org.apache.knox.gateway.ha.dispatch.ConfigurableHADispatch.executeRequest(ConfigurableHADispatch.java:159)
... 123 more Can someone assist for this issue to be solved .The user has access all the hadoop components via knox.
... View more
Labels:
- Labels:
-
Apache Oozie
09-18-2021
10:00 PM
Hi @Daggers You can store avro schema file in HDFS folder and point that folder for your hive table ..There is avro.schema.url property with value can be passed while Hive creating the table ..This solution works for versioning the avro schema file in HDFS for the respective table. You can explore schema registry as well
... View more
09-05-2021
05:30 AM
Hi , We are orchestrating NiFi flow status(completion/Failed) through Control-M. This is the step below to be followed . 1. Control-M put the files in source folder (SFTP server) and will be waiting for status through Status API (which you have to build separate REST API flow using Request processor to expose the status message). 2. NiFi List SFTP will be keep listening the file(.txt) and once new file has placed by Control-M , NiFi will process the file and NiFi processor load the content into Database , Database processor will have Success and Failure relationship . Success flow files you can capture status as success and 1 value using Update attribute processor , this values should be stored into distributed cache/other storage area using relevant processor .Same process for failure (-1) flow as well . 3. Now status message stored into distributed cache/other storage area, You can query the status from distributed cache/other storage area using Fetch processor and pass to Response processor to the waiting Control-M job (Control-M should keep polling the status until it receive response 1 and -1. 4. When Control-M finds 1 value then the flow is success and if -1 then processing has failed . Thanks
... View more
08-18-2021
02:20 PM
Hello , If your input json is coming below format { "TOT_NET_AMT" : "55.00", "H_OBJECT" : "File", "H_GROSS_AMNT" : ["55.00","58.00"], "TOT_TAX_AMT" : "9.55" } If the value of H gross amount is in List of String ["55.00","58.00"] instead of String "55.00,58.00" , then you can use Jolt transformation JSON NiFi processor to define jolt spec to convert into the required output. { "TOT_NET_AMT" : "55.00", "H_OBJECT" : "File", "H_GROSS_AMNT" : "58.00", "TOT_TAX_AMT" : "9.55" } Jolt transformation JSON configuration. Jolt specification : [ { "operation": "modify-overwrite-beta", "spec": { "H_GROSS_AMNT": "=lastElement(@(1,H_GROSS_AMNT))" } } ] If the input is not list of String for H Gross amount and only string with comma speparated, then please follow the below steps in order. you will have to extract whole json into single attribute using extracttext processor and the attribute will hold entire json content , Next you can use EvaluateJsonPath which extracts each json element into each attributes , once you have all 4 attributes with value after EvaluateJsonPath , then you construct the json in Replace text processor ,H_GROSS_AMNT will still hold the string comma separated and u can use H_GROSS_AMNT: substringAfterLast(,) which extracts last element from the string value . Examples of EvaluateJsonPath https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#jsonpath
... View more
07-30-2021
01:09 PM
Hi, Once you extracted the header into Flowfile attribute using ExtractText processor, next you are going to convert the header flowfile attribute into Flow file content OR you can keep the header value as in attribute ..The stack overflow explains about extracting header into flowfile attribute and next they have pass the headers as file into destination .To convert flowfile attribute to file/flowfile content ,we will have to use ReplaceText processor where you can pass flowfile attributes .. The success relationship of ReplaceText will only contains header in flowfile content and the original csv file will be replaced with header as content . The flowfile content, you can transfer to destination or next processor in the flow. Hope this information you are looking for .. Thanks
... View more
07-19-2021
07:59 AM
Hi , If Controller Services created at base processor group level , then you will not get controller services when you create a template only for the subprocessor group . You will have to create a template for base processor group of immediate child processor group or define the distributed mapcache controller service in the processor group which you would want to create a template. For example 'Stream Ingestion' is immediate super processorgroup of 'TestSample' processorgorup . If i create a template for 'TestSample' processor group then in the template i can not see the DistributedMapCacheServer controller service because the specific controller service scope is in Streaming Ingestion processgroup as a template. Controller Services created within a Process Group will be available/referenced to all descendant components. Thanks
... View more
07-19-2021
07:22 AM
Hi alexmarco, I did not find the 'Upload/Attach' option to upload the template file . Could you please follow the steps/screenshots mentioned , it should work for your example well. thanks
... View more
07-16-2021
08:41 AM
Hi alexmarco, If your final json format is fixed and input json also coming in same format always with values different then you extract keyword value (foo or new value) into flowfile attribute and you can use attribute in following replacetext processor to pass the value . ExtractText processor --> UpdateAttributeProcessor --> ReplaceText Processor 1. Add a new property( keyword_value) in ExtractText and value/expression should be below ("keyword":.*) 2. Remove space,double quotes from the extracted keyword_value attribute in UpdateAttributeProcessor. (You can add the updateattribute logic directly in Replacetext processor itself for retrieving the keyword_value also as ReplaceText processor supports NiFi expression language. Its optional and you avoid Updateattribute processor in this flow then if you chose ) 3. Append the keyword_value in ReplaceText processor (keep the final json ) as in sample . 4. Connect the success flow into Invokehttp processor. * CreateKeywordvalueAttribute (Extract processor) expression below : ("keyword":.*) * Updateattributeprocessor ${keyword_value:substringAfter(':'):trim():replace('"', '')} * FinalReplaceText processor : Place the below JSON into Replacement Value section of the processor {
"requests": [
{
"source": "blablabla",
"params": {
"keywords": [
"${keyword_value}"
],
"sub-field1": "1"
}
}
],
"field1": "1",
"field2": "2",
"field3": false
} I have attached the sample tested flow(.xml) for your reference . Please accept the solution if it works as expected. Thanks Adhi
... View more
- Tags:
- NiFi
07-13-2021
07:03 AM
We have solved this with help of wait and notify processor as we routed .hql file to puthql which interns routes success/failure to Notify . The Wait processor wait signal will release the .csv file to put into HDFS once Notify signal comes from Notify processor.
... View more
07-13-2021
06:52 AM
Hi Deepika, I assume that you are using ConsumeKafka OR ConsumeKafka2.0 version NiFi processor ,when you select option in SSL Context Service as StandardSSLContextService then you have to select right arrow as indicated below image . Properties of StandardSSLContextService will be prompted to you and you can enter the values of the properties Keystore Filename,Keystore Password,Key Password,Truststore Filename,Truststore Type. etc. After providing the values of the SSL properties , enable the Controller Service and start running the ConsumeKafka processor. ConsumeKafka2.0 processor properties Please update if above steps expected and works. Thanks
... View more
07-01-2021
12:45 PM
Hi , The InvokeHTTP processor provides couple of the write attributes with value for example invokehttp.status.code 401 invokehttp.status.message Unauthorized when you have this attributes in Failure or No Retry relationship ,you can use Replace text processor as below to overwrite the original flow into new flow file so that you send that in email . ReplaceText processor properties If you are looking for lot more message of the response body , please check whether you can configure in any attribute in InvokeHTTP processor so that you can use it ReplaceText processor to overwrite the original flow file
... View more
07-01-2021
04:22 AM
Hi, We are processing ZIP file contains multiple timestamp files (.hiveql,.csv) in distributed manner . We check the file extension whether it is .hql or .csv then we route the file to execute it PutHiveQL and PutHDFS processor respectively. The files(timestamp order starts with for example t1 or system timestamp) below contains in ZIP file to be extracted and processed in order. table_info.zip table_info_t1.hql
table_info_t1_1.csv
table_info_t1_2.csv
table_info_t2.hql
table_info_t2_1.csv
table_info_t2_2.csv
table_info_latest.hql
table_info_latest.csv Please find the below NiFi flow and RouteonAttribute property Is there any way to make us to wait first puthivesql executes first and give indication to putHDFS execution next for each timestamp file one by one order. Can we group each timestamp files into group and process the .hql file and the put .csv file into HDFS? @Nifi
... View more
Labels:
- Labels:
-
Apache NiFi
06-11-2021
03:16 AM
Hi , Thanks for the reply . As i mentioned that i have run the insert ino tableA select * from tableB (Assume Delta calculation query). Yes , we have managed table in hive and hive version is 3.0 where ACID are enabled by default .We have created the Hive managed table through HUE and executed the delta query via NiFi PutHQL processor and we got the error intermittently .
... View more
05-17-2021
04:16 AM
Hi , Thanks for the information . We have added this single property set hive.txn.stats.enabled=false and still we were getting the issue intermittenly from Nifi. My Solution architect found the cause for the issue after investigating Hive github code I believe. So we added to these two property below set hive.txn.stats.enabled=false set hive.stats.autogather=false The error disappeared and never come again .I would like to understand how to relate the hive github with those properties or how do have to troubleshoot this kind of strange issue .
... View more
04-22-2021
01:58 AM
Hi All, We are getting the below error intermittently for some tables and we have hive 3.1 . I am passing a sample query like insert into table test as select * from some table in Nifi PutHiveSQL processor . Hive target table is managed table and stored as avro format Error : PutHive3QL[id=d1652e76-54f7-30cf-8dc8-b1934cee3c26] Failed to update Hive for StandardFlowFileRecord[uuid=7447e7f0-4d1e-40fa-9965-5c98f7d11341,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1618924236521-4, container=default, section=4], offset=154416, length=3631],offset=0,name=20210421_test_branch_info,size=3631] due to java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask. MetaException(message:Cannot change stats state for a transactional table ra_test.test_branch_info without providing the transactional write state for verification (new write ID -1, valid write IDs null; current state {"BASIC_STATS":"true","COLUMN_STATS":{}; new state {"BASIC_STATS":"true","COLUMN_STATS":{}); it is possible that retrying the operation will succeed, so routing to retry: java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, Option tried : if we rerun the query for the same table through Nifi PutHiveSQL , then Nifi processor executes the query fine . sometimes it fails in the first run. Hive version : 3.0 and Nifi 1.11.4 Please assist what option to be checked in Hive side on this error. Thanks
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
02-11-2021
03:29 AM
@MattWho In continuation of above logic and i implemented and it is working fine . I have three nodes nifi cluster where a,b,c .Here a node is primary node always . If a goes out of space , we will unzip the file from another node and distributes to b,c nodes and we push the file to HDFS folder and we capture success and failure message( in Updateattribute processor) as mentioned in the posts above . After that in mergerecord process , the queue is waiting for sometime and merging happens after 10mins around . I have a dependency of the merging processor as i have to share the status of flowfile to another API which is requesting for it. How we can speed up the MergeRecord processor here . If all three nodes have good amount of memory and the flowfiles pushed to HDFS and status updates (through UpdateProcessorattribute) and merging happens very quickly as expected in single node. But we have nodes(b,c) and 'a' primary node is out due to space issue in device .The processor unpack the .zip file then distribute to nodes tp put into HDFS .As we capture each success/failure status of PutHDFS (using updateattribute processor) and mergerecord is taking sometime to merge the status . Is this because of other two nodes processing it slowly or some other reason for that. I have attached screenshot of the partial flow .
... View more
01-25-2021
04:20 AM
Hi All, I am trying to build JSON string using Ifelse condition in ReplaText processor but i am getting error as invalid Invalid Expression . Working Ifelse condition : ${status:contains('0'):ifElse('Success','Failed')} This below string is not working and it is throwing invalid expression . Please assist on this . ${status:contains('0'):ifElse('{ "filename":"${overall_filename}", "status":"${status}", "message":"${overall_message}" }', '{ "filename":"${overall_filename}", "status":"${exc_status}", "message":"${overall_message}" }' ) } Thanks
... View more
Labels:
- Labels:
-
Apache NiFi