Member since 
    
	
		
		
		03-03-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                74
            
            
                Posts
            
        
                10
            
            
                Kudos Received
            
        
                2
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2997 | 06-13-2018 12:02 PM | |
| 5772 | 11-28-2017 10:32 AM | 
			
    
	
		
		
		01-08-2019
	
		
		02:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,   I try to connect to my s3 bucket with ListS3 processor as shown on the picture       I got my credentials from the credentials file   λ cat credentials   [default]  
aws_access_key_id = xxxxxx   aws_secret_access_key = xxxxxx   aws_session_token = FQoGZXIvYXdzEBoxxxxxxxx  But it i got this error   sktq1ehdf1nn01.ccta.dk:9091ListS3[id=288c3ee5-0168-1000-ffff-ffffe596788e] ListS3[id=288c3ee5-0168-1000-ffff-ffffe596788e] failed to process due to com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 05B234720FA788D4), S3 Extended Request ID: g0d8XYmhVRcP+HLuVUEEFREd486cVPQD+h1DL7RTG5KSoU3HGAlFJU0cVBP1RATyjprjRgSp5aI=; rolling back session: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 05B234720FA788D4)  Then tried maunally renew my keys by   aws configure at set a fresh pair of crdentials in my ListS3 processor but with same errors.   I seems like the ACCESS_KEY property in ListS3 is not the same as the value aws_access_key_id from the   credentials file  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		12-18-2018
	
		
		07:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thats great news thank you Dan.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-14-2018
	
		
		06:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Dan Chaffelson thanks for the quick response   here is my code , this is just a POC so it may seems somewhat unstructured, just a playground    
	import os
	import sys
	import getpass
	import json
	import smtplib
	from nipyapi import nifi, config, templates, canvas
	from config import ConfigIni
	# Disable urllib3 certificate warnings
	from requests.packages.urllib3 import disable_warnings
	disable_warnings()
	class NifiInstance:
		""" The NifiInstance class facilitating easy to use
		methods utilizing the NiPyApi (https://github.com/Chaffelson/nipyapi)
		wrapper library.
		Arguments:
			url         (str): Nifi host url, defaults to environment variable `NIFI_HOST`.
			username    (str): Nifi username, defaults to environment variable `NIFI_USERNAME`.
			password    (str): Nifi password, defaults to environment variable `NIFI_PASSWORD`.
			verify_ssl  (bool): Whether to verify SSL connection - UNUSED as of now.
		"""
		def __init__(self, url=None, username=None, password=None, verify_ssl=False):
			config.nifi_config.host = self._get_url(url)
			config.nifi_config.verify_ssl = verify_ssl
			config.nifi_config.username = username
			self._authenticate(username, password)
		def _get_url(self, url):
			if not url:
				try:
					url = os.environ['NIFI_HOST']
				except KeyError:
					url = input('Nifi host: ')
			if not '/nifi-api' in url:
				if not url[-1] == '/':
					url = url + '/'
				url = url + 'nifi-api'
			return url
		def _authenticate(self, username=None, password=None):
			if not username:
				try:
					config.nifi_config.username = os.environ['NIFI_USERNAME']
				except KeyError:
					config.nifi_config.username = input('Username: ')
			if not password:
				try:
					password = os.environ['NIFI_PASSWORD']
				except KeyError:
					password = getpass.getpass('Password: ')
			access_token = None
			try:
				access_token = nifi.AccessApi().create_access_token(username=config.nifi_config.username,password=password)            
			except nifi.rest.ApiException as e:
				print('Exception when calling AccessApi->create_access_token: %s\n'.format(e))
			config.nifi_config.api_key[username] = access_token
			config.nifi_config.api_client = nifi.ApiClient(header_name='Authorization', header_value='Bearer {}'.format(access_token))        
		def list_processors_in_processorgroup(self,pg_id=None):
			listen = canvas.list_all_processors(pg_id)
			#jlisten= json.loads(listen)
			for item in listen:
				#print (str(item.id))
				print (str(item.status.name))
		
		
		def processor_status(self,p_id=None):
			pro=canvas.get_processor(p_id, 'id')
			return pro.status.run_status
		
	#get urls and froups to list from ini file. 
	start_init = ConfigIni('c:/temp/nipyapi/monitor.ini')
	#groups are returned as list of groups 
	groups= start_init.get_group_id()
	url = start_init.get_url()
	n = NifiInstance(url, 'myuser','mypassword')
	#test list_all_processor 
	test=canvas.list_all_processors('5b641351-34d0-3def-a376-7824fbe9cc0f')
	for item in test:
				print (str(item .id))
    my nipyapi version is   C:\Temp\nipyapi (nipyapi) λ pip freeze | grep nipyapi nipyapi==0.11.0 C:\Temp\nipyapi 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-13-2018
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi   I need to extract my processors for at particular Processor Group with id = 5b641351-34d0-3def-a376-7824fbe9cc0f   This Processor Group contains 10 processors and another Processor Group with 5 Processors   when i want to extract the Processors i only get the 5 processors from my "sub" Processor-Group   test=canvas.list_all_processors('5b641351-34d0-3def-a376-7824fbe9cc0f')
for item in test:
            print (str(item.id))   result   (nipyapi) λ python main.py
c7aad022-0ab3-353f-90cb-0999781d6309
7287c6b0-c9a8-3436-9902-52fb806e7c42
d3bd0289-490a-389c-ba90-c27c48a1e453
fb278074-6249-3aeb-88f8-d3bee857be76  I was expecting a list of 10 processors id's for the Processor-Group i call canvas.list_all_processors with,   It seems that it takes the deepest processor-group within the Processor-Group given in the argument.   It this work by design ? and is there another way to get the processors from top down within the group asked upon.   @Dan Chaffelson  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		11-10-2018
	
		
		10:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi have a large numbers of xml files stores in hbase, the files containing binary data like pdf. word etc.  The column contents holds content of the xml file.   I want to replace the binary  value from the xml tag DokumentFilIndhold with the value "Content Removed"   REGEXP_REPLACE(contents,"(?s)<ns0:DokumentFilIndhold[^>]*>.*?</ns0:DokumentFilIndhold>", "Content Removed")  The regular expression seems to work exactly as expected when i test it  with https://regexr.com/   But when i run the query on my data it cuts of the contents. So its no longer a valid xml file.   Does the function REGEXP_REPLACE have some limitations or is it my expression that's wrong the value is up to 65000 chars.   Its Urgent for me to find a solution, so any idea will be very well recieved.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		07-06-2018
	
		
		06:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Vinicius Higa Murakami  When i run it on hive cli, it failed   hive -e " select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');"
2018-07-06 08:27:22,044 WARN  [main] conf.HiveConf: HiveConf of name hive.mapred.supports.subdirectories does not exist
Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties
[Fatal Error] :1:21: Open quote is expected for attribute "xmlns:xsi" associated with an  element type  "tns:root".
FAILED: SemanticException [Error 10014]: Line 1:8 Wrong arguments ''root/second/four'': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(java.lang.String,java.lang.String)  on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathString@629984eb of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathString with arguments {<tns:root xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:tns=http://test.com xmlns=http://xmlns.oracle.com/pcbpel/adapter/noname> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>:java.lang.String, root/second/four:java.lang.String} of size 2
  I have tried it on a newer installation as well, just a few builds older than yours   Beeline version 1.2.1000.2.6.4.0-91 by Apache Hive  
Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91)  
Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91)  But still dont return anything.   Fortunately im leaving for vacation today and won't be looking into this the next 2 weeks, i had hoped to solve this before,   Thanks for  your help and suggestions.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-05-2018
	
		
		06:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you very much @Vinicius Higa Murakami   When i run this in beeline   select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');  It returns following   Connected to: Apache Hive (version 1.2.1000.2.5.0.0-1245)
Driver: Hive JDBC (version 1.2.1000.2.5.0.0-1245)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://sktudv01hdp01.ccta.dk:2181,sk> use adm_sfo_sit;
No rows affected (3.891 seconds)
0: jdbc:hive2://sktudv01hdp01.ccta.dk:2181,sk> select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');
+------+--+
| _c0  |
+------+--+
|      |
+------+--+
1 row selected (0.076 seconds) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-02-2018
	
		
		07:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi   Thanks, but i my hive version it doesn't return  anything   select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','/root/second/four');  Returns "" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-29-2018
	
		
		01:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 <tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname">
  <tns:second>
    <tns:third>10379</tns:third>
    <tns:four>stats</tns:four>
    <tns:five>1</tns:five>
    <tns:six>
      <tns:DokumentFilIndhold>K</tns:DokumentFilIndhold>
    </tns:six>
    <tns:seven>2018-06-28T12:57:36</tns:seven>
    <tns:eight>2018-06-28T13:02:28</tns:eight>
  </tns:second>
</tns:root>  This is my test xml,   Testdb is an external table on hbase on hadoop closter   i have tried to select the value of element "four" from my hive table where column contents is the xml   use testdb;
select 
FROM_UNIXTIME(CEIL((CAST(SPLIT(row_key, '\\|')[0] AS BIGINT))/1000)) AS received_at
, SPLIT(row_key, '\\|')[1] AS session_id
 ,row_key
 ,xpath_string(contents,'//tns:root/tns:second/tns:third/tns:four' ) AS test
  where row_key = '1530262082747|08004d28-3cf6-4446-bae1-93d43c08c189';
  But it returns empty string value, from my research i can se it probably has something to do with the namespace, but i haven't found anything useful out there yet.   does anyone know of any way to either work with  the namespace or ignore it in the hive query  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		06-13-2018
	
		
		12:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I found the solution my self, much more simple than i first thought, just cast the json type to text and avro will accept it.   select cast (json column as text ) columnName    from table  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        












