Member since
08-16-2018
5
Posts
2
Kudos Received
0
Solutions
12-27-2024
01:21 AM
1 Kudo
Thanks for getting back! Sounds about right then, I'm guessing running http only is unsupported in that case. Patching start.sh locally is a bit of a non starter here.. bit frustrating unless any of the nifi folks can weigh in @MattWho @SAMSAL to confirm https is the only way? Cheers!
... View more
12-23-2024
01:35 AM
1 Kudo
@anon12345 Just to clarify, presumably is this over and above your modifications to start.sh? I can't find anywhere that says plain http is supported in 2.0, but I'm ready to give up trying to get https playing nice with traefik, so this whole complication is redundant anyway. Here's what I've tried:- ❯ docker run --rm --name nifi \ -p 8080:8080 \ -e NIFI_WEB_HTTP_PORT=8080 \ -e NIFI_WEB_HTTP_HOST=0.0.0.0 \ -e NIFI_WEB_HTTPS_PORT= \ -e NIFI_WEB_HTTPS_HOST= \ -e NIFI_WEB_PROXY_HOST=localhost:8080 \ -e NIFI_CLUSTER_IS_NODE=false \ -e SINGLE_USER_CREDENTIALS_USERNAME=nifi \ -e SINGLE_USER_CREDENTIALS_PASSWORD=nifipassword \ -e NIFI_SECURITY_KEYSTORE= \ -e NIFI_SECURITY_KEYSTOREPASSWD= \ -e NIFI_SECURITY_KEYPASSWD= \ -e NIFI_SECURITY_TRUSTSTORE= \ -e NIFI_SECURITY_TRUSTSTOREPASSWD= apache/nifi:2.0 HTTPS still enabled. 🙁
... View more
08-16-2018
05:13 PM
Hi @Steven Matison Many thanks for your response. That all sounds very interesting, I have no experience with Kafka but I will check it out and see if it fits into what I’m trying to achieve here. Unless I’m misunderstanding - my major issue is not having a common identifier between datasets to deduplicate from so having to rely on an external tool (such as dedupe) to do some fancy data science work when clustering the duplicates e.g. looking at forename, surname, address and deciding if it should be clustered. There is also an element of training involved which would need to happen externally to further confuse things as it is an external tool. I suppose if I enriched the data with a common cluster id I could then fire this to Kafka for the data compaction bit which would match what you have above. Anyway, good to know I’m going along the right track so thanks again for your answer - ScrollElasticsearchHttp is interesting to read about! Cheers! Gavin.
... View more
08-16-2018
03:03 PM
Hi there! I’ve just heard about Apache Nifi through word of mouth and wondering if somebody could point me in the right direction with my use case - my team’s recently been thrown into the deep end with some requirements and would really appreciate the help. Problem: Our end game is to build a federated search of customers over a variety of large separate datasets which hold varying degrees of differing data about individuals, so it’s primarily an entity resolution problem. I was thinking Nifi could help query our various databases, merge the result, deduplicate the entries via an external tool and then push this result to an Elasticsearch instance for our applications querying. Roughly speaking something like this (haven’t tried implementing this flow yet!):- pasted-graphic-2.png So, for examples sake the following data in the result database from the first flow :- first.png Then run https://github.com/dedupeio/dedupe over this database table which will add cluster ids to aid the record linkage, e.g.:- second.png Second flow would then feed this result into Elasticsearch instance for use by the API and front-end querying. Questions: Does this approach sound feasible? How would I trigger dedupe to run to ultimately cluster the duplicates after the merged content was pushed to the database? The corollary question - how would the second flow know when to fetch results for pushing into Elasticsearch? Periodic polling?
Thanks for any insight anyone can give me regarding this, I’d be happy to consider any other bits of tech stack people might have if there was an entirely better way to approach it as I’d like this to be as robust as possible. I appreciate this isn’t primarily an Nifi question and I haven’t considered any CDC process here to capture updates to the datasets so I’d imagine this would get even more complicated… P.S. I’ve watched the HortonWorks talk here https://youtu.be/fblkgr1PJ0o?t=3149 which I found helpful and mentioned these community forums. Cheers, Gavin.
... View more
Labels:
- Labels:
-
Apache NiFi