Member since
03-03-2017
74
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2581 | 06-13-2018 12:02 PM | |
4658 | 11-28-2017 10:32 AM |
01-08-2019
02:20 PM
Hi, I try to connect to my s3 bucket with ListS3 processor as shown on the picture I got my credentials from the credentials file λ cat credentials [default]
aws_access_key_id = xxxxxx aws_secret_access_key = xxxxxx aws_session_token = FQoGZXIvYXdzEBoxxxxxxxx But it i got this error sktq1ehdf1nn01.ccta.dk:9091ListS3[id=288c3ee5-0168-1000-ffff-ffffe596788e] ListS3[id=288c3ee5-0168-1000-ffff-ffffe596788e] failed to process due to com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 05B234720FA788D4), S3 Extended Request ID: g0d8XYmhVRcP+HLuVUEEFREd486cVPQD+h1DL7RTG5KSoU3HGAlFJU0cVBP1RATyjprjRgSp5aI=; rolling back session: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 05B234720FA788D4) Then tried maunally renew my keys by aws configure at set a fresh pair of crdentials in my ListS3 processor but with same errors. I seems like the ACCESS_KEY property in ListS3 is not the same as the value aws_access_key_id from the credentials file
... View more
Labels:
- Labels:
-
Apache NiFi
12-18-2018
07:32 AM
Thats great news thank you Dan.
... View more
12-14-2018
06:44 AM
Hi @Dan Chaffelson thanks for the quick response here is my code , this is just a POC so it may seems somewhat unstructured, just a playground
import os
import sys
import getpass
import json
import smtplib
from nipyapi import nifi, config, templates, canvas
from config import ConfigIni
# Disable urllib3 certificate warnings
from requests.packages.urllib3 import disable_warnings
disable_warnings()
class NifiInstance:
""" The NifiInstance class facilitating easy to use
methods utilizing the NiPyApi (https://github.com/Chaffelson/nipyapi)
wrapper library.
Arguments:
url (str): Nifi host url, defaults to environment variable `NIFI_HOST`.
username (str): Nifi username, defaults to environment variable `NIFI_USERNAME`.
password (str): Nifi password, defaults to environment variable `NIFI_PASSWORD`.
verify_ssl (bool): Whether to verify SSL connection - UNUSED as of now.
"""
def __init__(self, url=None, username=None, password=None, verify_ssl=False):
config.nifi_config.host = self._get_url(url)
config.nifi_config.verify_ssl = verify_ssl
config.nifi_config.username = username
self._authenticate(username, password)
def _get_url(self, url):
if not url:
try:
url = os.environ['NIFI_HOST']
except KeyError:
url = input('Nifi host: ')
if not '/nifi-api' in url:
if not url[-1] == '/':
url = url + '/'
url = url + 'nifi-api'
return url
def _authenticate(self, username=None, password=None):
if not username:
try:
config.nifi_config.username = os.environ['NIFI_USERNAME']
except KeyError:
config.nifi_config.username = input('Username: ')
if not password:
try:
password = os.environ['NIFI_PASSWORD']
except KeyError:
password = getpass.getpass('Password: ')
access_token = None
try:
access_token = nifi.AccessApi().create_access_token(username=config.nifi_config.username,password=password)
except nifi.rest.ApiException as e:
print('Exception when calling AccessApi->create_access_token: %s\n'.format(e))
config.nifi_config.api_key[username] = access_token
config.nifi_config.api_client = nifi.ApiClient(header_name='Authorization', header_value='Bearer {}'.format(access_token))
def list_processors_in_processorgroup(self,pg_id=None):
listen = canvas.list_all_processors(pg_id)
#jlisten= json.loads(listen)
for item in listen:
#print (str(item.id))
print (str(item.status.name))
def processor_status(self,p_id=None):
pro=canvas.get_processor(p_id, 'id')
return pro.status.run_status
#get urls and froups to list from ini file.
start_init = ConfigIni('c:/temp/nipyapi/monitor.ini')
#groups are returned as list of groups
groups= start_init.get_group_id()
url = start_init.get_url()
n = NifiInstance(url, 'myuser','mypassword')
#test list_all_processor
test=canvas.list_all_processors('5b641351-34d0-3def-a376-7824fbe9cc0f')
for item in test:
print (str(item .id))
my nipyapi version is C:\Temp\nipyapi (nipyapi) λ pip freeze | grep nipyapi nipyapi==0.11.0 C:\Temp\nipyapi
... View more
12-13-2018
03:06 PM
Hi I need to extract my processors for at particular Processor Group with id = 5b641351-34d0-3def-a376-7824fbe9cc0f This Processor Group contains 10 processors and another Processor Group with 5 Processors when i want to extract the Processors i only get the 5 processors from my "sub" Processor-Group test=canvas.list_all_processors('5b641351-34d0-3def-a376-7824fbe9cc0f')
for item in test:
print (str(item.id)) result (nipyapi) λ python main.py
c7aad022-0ab3-353f-90cb-0999781d6309
7287c6b0-c9a8-3436-9902-52fb806e7c42
d3bd0289-490a-389c-ba90-c27c48a1e453
fb278074-6249-3aeb-88f8-d3bee857be76 I was expecting a list of 10 processors id's for the Processor-Group i call canvas.list_all_processors with, It seems that it takes the deepest processor-group within the Processor-Group given in the argument. It this work by design ? and is there another way to get the processors from top down within the group asked upon. @Dan Chaffelson
... View more
Labels:
- Labels:
-
Apache NiFi
11-10-2018
10:41 AM
Hi have a large numbers of xml files stores in hbase, the files containing binary data like pdf. word etc. The column contents holds content of the xml file. I want to replace the binary value from the xml tag DokumentFilIndhold with the value "Content Removed" REGEXP_REPLACE(contents,"(?s)<ns0:DokumentFilIndhold[^>]*>.*?</ns0:DokumentFilIndhold>", "Content Removed") The regular expression seems to work exactly as expected when i test it with https://regexr.com/ But when i run the query on my data it cuts of the contents. So its no longer a valid xml file. Does the function REGEXP_REPLACE have some limitations or is it my expression that's wrong the value is up to 65000 chars. Its Urgent for me to find a solution, so any idea will be very well recieved.
... View more
Labels:
- Labels:
-
Apache Hive
07-06-2018
06:39 AM
Hi @Vinicius Higa Murakami When i run it on hive cli, it failed hive -e " select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');"
2018-07-06 08:27:22,044 WARN [main] conf.HiveConf: HiveConf of name hive.mapred.supports.subdirectories does not exist
Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties
[Fatal Error] :1:21: Open quote is expected for attribute "xmlns:xsi" associated with an element type "tns:root".
FAILED: SemanticException [Error 10014]: Line 1:8 Wrong arguments ''root/second/four'': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.xml.UDFXPathString.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathString@629984eb of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathString with arguments {<tns:root xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:tns=http://test.com xmlns=http://xmlns.oracle.com/pcbpel/adapter/noname> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>:java.lang.String, root/second/four:java.lang.String} of size 2
I have tried it on a newer installation as well, just a few builds older than yours Beeline version 1.2.1000.2.6.4.0-91 by Apache Hive
Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91)
Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91) But still dont return anything. Fortunately im leaving for vacation today and won't be looking into this the next 2 weeks, i had hoped to solve this before, Thanks for your help and suggestions.
... View more
07-05-2018
06:58 AM
Thank you very much @Vinicius Higa Murakami When i run this in beeline select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four'); It returns following Connected to: Apache Hive (version 1.2.1000.2.5.0.0-1245)
Driver: Hive JDBC (version 1.2.1000.2.5.0.0-1245)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://sktudv01hdp01.ccta.dk:2181,sk> use adm_sfo_sit;
No rows affected (3.891 seconds)
0: jdbc:hive2://sktudv01hdp01.ccta.dk:2181,sk> select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','root/second/four');
+------+--+
| _c0 |
+------+--+
| |
+------+--+
1 row selected (0.076 seconds)
... View more
07-02-2018
07:06 AM
Hi Thanks, but i my hive version it doesn't return anything select xpath_string('<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname"> <tns:second><tns:third>10379</tns:third><tns:four>stats</tns:four><tns:five>1</tns:five><tns:six><tns:DokumentFilIndhold>K</tns:DokumentFilIndhold></tns:six><tns:seven>2018-06-28T12:57:36</tns:seven><tns:eight>2018-06-28T13:02:28</tns:eight></tns:second></tns:root>','/root/second/four'); Returns ""
... View more
06-29-2018
01:15 PM
1 Kudo
<tns:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://test.com" xmlns="http://xmlns.oracle.com/pcbpel/adapter/noname">
<tns:second>
<tns:third>10379</tns:third>
<tns:four>stats</tns:four>
<tns:five>1</tns:five>
<tns:six>
<tns:DokumentFilIndhold>K</tns:DokumentFilIndhold>
</tns:six>
<tns:seven>2018-06-28T12:57:36</tns:seven>
<tns:eight>2018-06-28T13:02:28</tns:eight>
</tns:second>
</tns:root> This is my test xml, Testdb is an external table on hbase on hadoop closter i have tried to select the value of element "four" from my hive table where column contents is the xml use testdb;
select
FROM_UNIXTIME(CEIL((CAST(SPLIT(row_key, '\\|')[0] AS BIGINT))/1000)) AS received_at
, SPLIT(row_key, '\\|')[1] AS session_id
,row_key
,xpath_string(contents,'//tns:root/tns:second/tns:third/tns:four' ) AS test
where row_key = '1530262082747|08004d28-3cf6-4446-bae1-93d43c08c189';
But it returns empty string value, from my research i can se it probably has something to do with the namespace, but i haven't found anything useful out there yet. does anyone know of any way to either work with the namespace or ignore it in the hive query
... View more
Labels:
- Labels:
-
Apache Hive
06-13-2018
12:02 PM
I found the solution my self, much more simple than i first thought, just cast the json type to text and avro will accept it. select cast (json column as text ) columnName from table
... View more