Member since
08-21-2018
11
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1301 | 08-23-2018 04:49 AM |
12-17-2018
03:16 PM
The problem with this is it returns one line for any match o if the html has many a href injected with no line break and and I want to produce a line for each match of a href how do i do it. eg: <a href="something.html">something</a><a href="anotherthing.html">another</a><a href="yetmore.html">Yet More</a>
RegExpression for a href block.
ROUTE TEXT returns just one line
<a href="something.html">something</a><a href="anotherthing.html">another</a><a href="yetmore.html">Yet More</a>
What I want is (Basically for each match new line)
<a href="something.html">something</a>
<a href="anotherthing.html">another</a>
<a href="yetmore.html">Yet More</a>
... View more
10-29-2018
12:55 AM
Found the solution. Thanks to a template found on Apache Wiki. Fetch the HTML and feed to RouteText This produces 1 file of all matches with each match on one line. Then use splitText to break into individual flow files for each line.
... View more
10-26-2018
06:43 AM
I am scraping a page for multiple links. I use extract Text from the HTML and have the Enable Repeating Capture set to true. I get the following attributes associated to the flow file URl.0, URL.1...URL.n I want to process this individually. How do I either make these attributes the new flow? Convert into something that will allow for accessing each one? The number of links found is not a fixed number. I did try ExecuteScript but it killed Nifi. Was my first attemp at using the executescript. var OutputStreamCallback = Java.type("org.apache.nifi.processor.io.OutputStreamCallback");
var IOUtils = Java.type("org.apache.commons.io.IOUtils");
var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");
var flowFiles =[];
var flowFile = session.get()
if (flowFile != null) {
var myAttr = flowFile.getAttribute('urls').split("|");
if(myAttr.length>0)
{
for(var i=0;i<myAttr.length;i++)
{
var newflowFile = session.create(flowFile);
newflowFile =session.write(newflowFile ,
new OutputStreamCallback(function(outputStream) {
outputStream.write(myAttr[i].getBytes(StandardCharsets.UTF_8))
}));
flowFiles .push(newflowFile );
}
session.transfer(flowFiles , REL_SUCCESS);
session.remove(flowFile);
session.commit();
}
}
... View more
10-19-2018
12:19 AM
I am now researching DBCPConnectionPoolLookup 1.7.1 but I can find no help on this at all. How does one use it? I want to have DBCPService defined and then Use DBCPConnectionPoolLookup 1.7.1 with variable database.name to call to allow execution of sql. It seems can't have two DBCPConnectionPools with same ODBC driver
... View more
10-18-2018
05:42 AM
I Have two separate processor groups access two separate database on the same SQLServer instance. I have created two
DBCPConnectionPool with the only real difference being the Database name in the SQLServer connection string. One works the other when trying to enable stays enabling all the time. I am sure this relates to something can only be used once but I am unsure. Help appreciated. Lee
... View more
10-08-2018
11:45 PM
I wish to convert three variables to a specific json. I am unsure of best method. The Attributes fromDate, toDate, id to JSON { "start":"fromDate.value", "end":"endDate.value" "where":{
"myid": "id.value"
} }
... View more
08-24-2018
04:13 AM
Unable to solve this issue. The process works with the Advance editor The JOLT [{
"operation": "shift",
"spec": {
"@.source": "header.source",
"@.author": "header.author",
"@.publishedAt": "header.date",
"@.title": "header.title",
"@.description": "header.description",
"@.urlToImage": "header.image",
"@.url": "header.url",
"@": "source"
}
},
{
"operation": "default",
"spec": {
"triples":{},
"header": {
"id":"${uuid}",
"type": "News",
"ingestMethod": "NiFi",
"injestionMethod" : "${methodName}",
"injestionUrl" : "${url}${searchQuery}${apiKey}${fromDate}${domains}",
"injestDate":"${now()}"
}
}
}
] the feed coming from a JsonSplit {
"source": {
"id": "abc-news-au",
"name": "ABC News (AU)"
},
"author": "ABC Radio National",
"title": "The Talented Mr Daly",
"description": "Peter Daly projects confidence and success. He wears gold rings, gold cufflinks, and a gold watch. The market, he says, is his backyard and he knows it \"damn well\". But the 59-year-old is actually in a world of trouble. The corporate watchdog, ASIC, accuses h…",
"url": "http://www.abc.net.au/radionational/programs/backgroundbriefing/the-talented-mr-daly/10158316",
"urlToImage": "http://www.abc.net.au/radionational/image/10158320-1x1-700x700.jpg",
"publishedAt": "2018-08-25T22:05:00Z"
}
... View more
08-23-2018
04:49 AM
1 Kudo
I have solved the problem.
It was simple issue but all new to this and none of the examples online show this fix.
First you need to have a keystore and there is a link above on creating one of those.
Then this is what was missed you need to download the certificate from the site providing the data.
echo -n | openssl s_client -connect newsapi.org:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt Then import to your keystore sudo keytool -import -keystore truststore.jks -file /tmp/examplecer.crt -alias <sitename> Then set up the Control Service Device for the StandardSSLContextService
... View more
08-22-2018
11:08 PM
I added this answer yesterday but not there today. YES and YES I can access newsapi.org via ping from the server for nifi.
... View more
08-22-2018
05:17 AM
Thank you @Steven Matison this solved the set up of the standardSSLContextService with the help of This article on setting up the cacerts but I still get a UnknownHostException. Any ideas anyone? Same thing happens on GetHTTP as well.
... View more
08-21-2018
04:55 AM
I have created a aipKey to access newsapi.org I have a simple query https://newsapi.org/v2/everything?q=bitcoin&apiKey=<key>; I get one of two error using a GetHTTP processor (SSL) Or if I use http:// I get unknownhostexception. If I use an InvokeHttp processor I just get unknown host. I can curl the request from the server so it is not blocked. When i set up a standardSSLContextService I am unsure where the certificate path is and what other information is meant to b in this. It all seems a bit of a bother to simply consume a JSON api feed.
... View more