Member since
09-13-2019
3
Posts
0
Kudos Received
0
Solutions
03-18-2020
09:20 AM
Hello all, I am having some issues with encodings in flow files and I'm truly stumped. I have some flow files that have invalid UTF-8 characters present and when I attempt to read them in to my Python script as flow_file = sys.stdin.read() it fails with the error "'utf-8' codec can't decode byte 0xd1 in position 25729: invalid continuation byte". When I grabbed the full content of the flow file and ran it in my local Python console, it appeared to fix the issue. Specifically, import unidecode
string = """PIÑEDA""" # example string, not content of flow file
unidecode.unidecode(string)
Out[28]: 'PINEDA' works just fine. But, when I attempt this in an ExecuteStreamCommand Python script #!/usr/bin/python3
import sys
import unidecode
try:
flow_file = sys.stdin.read()
sys.stdout.write(flow_file)
except UnicodeDecodeError as e:
flow_file = unidecode.unidecode(sys.stdin.read())
sys.stdout.write(flow_file) it produces no error, but returns a 0 byte file in the output stream. Why is that the code works differently on my local vs in NiFi, and how do I fix this?
... View more
- Tags:
- NiFi
Labels:
02-10-2020
08:20 AM
I'm attempting to use ListS3 to access a bucket in S3 and I've set up an AWSCredentialsProviderControllerService. However, when I run this processor it returns with
The security token included in the request is invalid (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId)
I can run from the aws cli without issue, and both the command line and NiFi are running the same user. I'm just stumped as to why it's failing on NiFi but not on the cli when I run as the nifi user. It seems like it can't see my profilename profile which is already configured with the region, the role_arn, and the credential_source.
On the cli, I run
aws s3 cp s3://bucket/path/to/file.file /path/to/dest --profile profilename
The configs for the Controller Service are
Use Default Credentials: false
Access Key: No value set
Secret Key: No value set
Credentials File: No value set
Profile Name: profilename
Use Anonymous Credentials: false
Assume Role ARN: arn:aws:iam:00000000000:role/bucket
Assume Role Session Name: default
Session Time: 3600
I'm just not clear why this isn't working in NiFi but is working on the cli, and I'm at a loss as to how to fix it.
... View more
Labels:
09-13-2019
10:56 AM
I'm trying to convert JSON to CSV using the ConvertRecord processor but the only error I'm getting back is Could not parse incoming data. As this is not very descriptive, I'm at a loss as to how to diagnose the issue.
I know that my avro schema is valid because A) NiFi doesn't throw an error regarding the schema when I insert it into the Schema Registry and B) I tested my schema on here and it didn't give me an issue.
I also know that my JSON is valid because I can load it in Python using json.loads() and it doesn't give me any problems. I'm just not quite sure where I've gone wrong, nor how to fix it.
JSON
```
{ "DOC" : { "DOCID" : "1234" , "Subjects" : { "Subject_xref" : [ "2233" ] } , "TXT" : { "COUNTRY" : [ "United States" ] , "ESTATE" : [ "Mount Vernon" ] , "PERSON" : [ "George Washington" ] } , "RAW_TXT" : "George Washington lived in his family home, Mount Vernon, located in the United States." , "RELINFO" : [ { "ID" : "REL-1234-100" , "RELTYPE" : "PER-PROP" , "PERID" : "PER-1234-009" , "PROPID" : "PROP-1234-001" , "SENTID" : "1234-SENT-001" , "PROP_NORM" : "Mount Vernon" , "PROP_MENTION" : "Mount Vernon" , "PER_NORM" : "George Washington" , "PER_MENTION" : "George Washington" } ] , "ENTINFO" : [ { "ID" : "PER-1234-009" , "TYPE" : "PERSON" , "NORM" : "George Washington" , "REFID" : "PER-1234-009" , "MENTION" : "George Washington" } , { "ID" : "CTRY-1234-003" , "TYPE" : "COUNTRY" , "NORM" : "United States" , "REFID" : "CTRY-1234-003" , "MENTION" : "United States." } , { "ID" : "PROP-1234-001" , "TYPE" : "ESTATE" , "NORM" : "Mount Vernon" , "REFID" : "PROP-1234-001" , "MENTION" : "Mount Vernon" } ] } }
```
Avro
```
{ "type" : "record" , "namespace" : "name.space" , "name" : "nlp_output" , "fields" : [ { "name" : "DOC" , "type" : { "name" : "DOCDocument" , "type" : "record" , "namespace" : "doc.name.space" , "fields" : [ { "name" : "DOCID" , "type" : [ "long" , "null" ] , "default" : null } , { "name" : "Subjects" , "type" : { "name" : "Subjects" , "type" : "record" , "namespace" : "subjects.name.space" , "fields" : [ { "name" : "SubjectIdentificationID" , "aliases" : [ "Subject_xref" ] , "type" : [ "long" , "null" ] , "default" : null } ] }} , { "name" : "TXT" , "type" : { "name" : "TXT" , "type" : "record" , "namespace" : "text.name.space" , "fields" : [ { "name" : "COUNTRY" , "type" : { "type" : "array" , "items" : [ "string" , "null" ]} , "default" : null, "doc" : "" } , { "name" : "ESTATE" , "type" : { "type" : "array" , "items" : [ "string" , "null" ]} , "default" : null, "doc" : "" } , { "name" : "PERSON" , "type" : { "type" : "array" , "items" : [ "string" , "null" ]} , "default" : null, "doc" : "" } ] }} , { "name" : "RAW_TXT" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "RELINFO" , "type" : { "name" : "RelatedEntities" , "type" : "record" , "namespace" : "relent.name.space" , "fields" : [ { "name" : "ID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "RELTYPE" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PERID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PROPID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "SENTID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PROP_NORM" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PROP_MENTION" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PER_NORM" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "PER_MENTION" , "type" : [ "string" , "null" ] , "default" : null } ] }} , { "name" : "ENTINFO" , "doc" : "Sentences stripped of tags for ease of reading" , "type" : { "name" : "Entities" , "type" : "record" , "namespace" : "entities.name.space" , "fields" : [ { "name" : "ID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "TYPE" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "NORM" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "REFID" , "type" : [ "string" , "null" ] , "default" : null } , { "name" : "MENTION" , "type" : [ "string" , "null" ] , "default" : null } ] }} ] }} ] }
```
... View more
Labels: