Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connecting to DataSift HTTPS API using NiFi GetHTTP

avatar
Expert Contributor

Hi all

Is it possible to use GetHttp processor in Nifi to connect to datasift streaming api and receive live streaming data. I have used Gethttp for http api but for https we need ssl context and username and password. Any ideas how to connect to https url with nifi?

1 ACCEPTED SOLUTION

avatar
Rising Star

You will need to create and configure an SSLContextService for the processor to use so that it can establish trust with the certificate being presented by the DataSift service. curl works because it is tying into the default system truststore for you.

To provide a similar experience as curl on the command line, you will need to configure the truststore properties for your SSL Context Service instance with:

  • Truststore Filename: the cacerts file from your Java installation
    • If $JAVA_HOME Is set on your system, it should help point you in the right direction. If not, the location of cacerts varies depending on environment, but is approximately the following for their respective OS
      • OS X: /Library/Java/JavaVirtualMachines/jdk<version>.jdk/Contents/Home/jre/lib/security/cacerts
      • Windows: C:\Program Files\Java\jdk<version>\jre\lib\security\cacerts
      • Linux: /usr/lib/jvm/java-<version>/jre/lib/security/cacerts -- You can additionally use $(readlink -f $(which java))
  • Truststore Type: JKS
  • Truststore Password: The default password of "changeit" if you are using the default Java keystore

When this controller service is created and enabled, the associated GetHTTP will need to be updated to reference it.

View solution in original post

10 REPLIES 10

avatar
Expert Contributor

For the datasift i have a curl https url i.e. https://stream.datasift.com/fb409968ceacb8e588bb82de95c59958 -H 'Auth: suri:dba37513923299cbb5bcbff766bacd3d'. when i do curl command it works but when i use the same url in Gethttp it throws ssl error and invokehttp processor wont fetch anything.. any ideas?

avatar
Rising Star

You will need to create and configure an SSLContextService for the processor to use so that it can establish trust with the certificate being presented by the DataSift service. curl works because it is tying into the default system truststore for you.

To provide a similar experience as curl on the command line, you will need to configure the truststore properties for your SSL Context Service instance with:

  • Truststore Filename: the cacerts file from your Java installation
    • If $JAVA_HOME Is set on your system, it should help point you in the right direction. If not, the location of cacerts varies depending on environment, but is approximately the following for their respective OS
      • OS X: /Library/Java/JavaVirtualMachines/jdk<version>.jdk/Contents/Home/jre/lib/security/cacerts
      • Windows: C:\Program Files\Java\jdk<version>\jre\lib\security\cacerts
      • Linux: /usr/lib/jvm/java-<version>/jre/lib/security/cacerts -- You can additionally use $(readlink -f $(which java))
  • Truststore Type: JKS
  • Truststore Password: The default password of "changeit" if you are using the default Java keystore

When this controller service is created and enabled, the associated GetHTTP will need to be updated to reference it.

avatar
Expert Contributor

thanks @Aldrin Piri it was really helpful. My ssl issue is not sorted. But just wondering when i was streaming the data usinf puthdfs it errors saying json file already exists but i just created a new json file before i start nifi and then streaming data to that file in hdfs. Do i have write any expression language to say if the file is of certain bytes then create a new file or whats the best way forward? thank you

avatar
Rising Star

Is this for the GetHTTP? If so, yes, EL would be the best path forward to create unique files via the Filename property. Alternatively, you can use an UpdateAttribute processor to update the filename attribute to a new name in the flow if there is additional context or knowledge of the file that helps in that process.

Regarding the SSL issues, could you provide more information as to what is not working? Would like to ensure we get you on the right track here or address any bugs that may be lurking behind the scenes for that process.

Thanks!

avatar
Expert Contributor

@Aldrin PiriI am trying to use InvokeHttp processor for the follwoing datasift https url. SSlcontext is now been setup. In the properties of InvokeHttp i gave https url and gave basic username and password which i was using for curl. I gave Auth in basic username and password. It starts up but doesnt pull any data. How do we stop it apart from restarting nifi. Also once the connection is made i want to keep it alive and doesn want it to get time out like in gethttp..??

. https://stream.datasift.com/fb409968ceacb8e588bb82de95c59958 -H 'Auth: suri:dba37513923299cbb5bcbff766bacd3d

1304-screenshot-from-2016-01-13-11-01-24.png

avatar

Just a side comment: cacerts ("the default") truststore shipped with JRE does not always contain all certificates needed. I have run into issue, when using OS default CA certificates handling, the webpage was using valid certificate, but Java was considering the certification path incomplete.

I am using Ubuntu and to mitigate this issue, one can import all certificates from ca-certificates package of Ubuntu into Java truststore to be used with NiFi.

To import all ca-certificates from Ubuntu to your truststore, you can use openssl pkcs12 export tool:

openssl pkcs12 -export -nokeys -in /etc/ssl/certs/ca-certificates.crt -out /etc/nifi/truststore.p12

where /etc/nini/truststore.p12 is the truststore to be set in SSLContextService. Remember to change also the type of keystore to pkcs12 (not JKS).

If you are unlucky, like I was, you may run into issue where JRE is unable to parse PKCS12 generated by openssl (openjdk has this problem with IBM generated file

https://bugzilla.redhat.com/show_bug.cgi?id=961069, it seems like Java implementation of PKCS12 is 'we had to do it, but we don't mind, use JKS).

Then, one can import all /etc/ssl/certs/*.pem files into JKS truststore by using keytool from JDK distribution (this is bash code):

for file in `ls /etc/ssl/certs/*.pem`; do keytool -noprompt -importcert -keystore /etc/nifi/truststore.jks -storepass changeit -file $file -alias $file; done

Now we have JKS type keystore which can be read by Java (it was written by Java so we at least hope so Java can read it). Just set this truststore in SSLContextService and you have all certs which Ubuntu has provided to you as trusted.

As a verification that import worked, one can compare count of *.pem files to count of certificates in truststore:

ls -1 /etc/ssl/certs/*.pem | wc

keytool -storepass changeit  -list -keystore /etc/nifi/truststore.jks  | grep finge | wc

Number of lines should be equal.

avatar
Contributor

Hi @Aldrin Piri

I am facing the same challenge. I configured the ssl context service after adding facebook certificate to default java cacerts truststore but my getHTTP is showing error of illegal arguement exception in the url. Below is the screenshot. appreciate if you could help me on this.

regards,,

Omer

13163-error.png

avatar
Rising Star

Hi @omer alvi,

You are getting an illegal character in the query which I am assuming is the | (pipe) character. You may need to url encode your url. Luckily, you can achieve this with NiFi Expression Language. Of note is the urlEncode function, with docs available at https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#urlencode.

avatar
Contributor

Hi @Aldrin Piri

Great ! It worked.

Thanks alot for your support 🙂

Cheers,

Omer