Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
09-15-2021
07:08 PM
Yes the above comment on the functionality not working any longer is accurate. NiFi has gone through many enhancements and therefore not reflected in this article. I do not plan on updating the article due to the following: The objective of this article was to get nifi running in K8s because during that time NiFi on K8s offering did not exist within Hortonworks HDF. Recently NiFi on K8s has been GA'd. It's call DataFlow experience. Docs here:https://docs.cloudera.com/dataflow/cloud/index.html. If you want NiFi on K8s, that is your new playground.
... View more
08-30-2021
01:20 PM
1 Kudo
File processing + autoscaling seems to have an antinomic relationship. It doesn't have to be..really. Autoscaling often drives inefficient behaviors..why? The concomitant response... "It Autoscales". Autoscaling still requires sound design principles. Without that, It won't autoscale well.
Autoscaling within a distributed framework requires proper data dichotomization along with processing/service layer decoupling
Anti-Pattern
Taking an example of where I've seen the most amount of heartburn.
Large files (i.e. zip/tar/etc) land in s3/Storage area. The knee-jerk reaction is to feed it into a distributed processing engine that autoscales. The "autoscaling" part appropriately. It may. Most likely you're flipping a coin and lighting a candle, hoping that all works out. What if the file sizes are heterogeneous and the variances between them could be quite significant? What about error handling? Does it have to be all or nothing (meaning all files need to be processing or all fail)?
What if autoscaling is driven through smaller processing units (groups)?
Here we have taken the same payloads, but defrag them into smaller units. Each unit requires a heterogeneous compute footprint driven resource consumption efficiencies.
Technical Details
For example, a payload (myExamplePayload.zip) containers 1000 JSON files. Instead of throwing this entire payload at a single compute cluster (requiring the maximum number of resources possible, aka top-line compute profile)...defrag the payload.
As the payload arrives in s3, CDF-X's (Cloudera Data Flow Experience) s3 processor listens for any files. CDF-X will pull the files, decompress, and write all the files back to s3. For example:
s3://My-S3-Bucket/decompressed/file1.json, 3://My-S3-Bucket/decompressed/file2.json, ...
In parallel CDF-X generates CDE (Cloudera Data Engineering, Spark) a job spec, a JSON payload. The job spec includes file locations. Each job would be provided with roughly ~100 file names+location. Since CDF-X knows about the file size, it can also hint to CDE how much compute would be required to process these files. This step is not necessary as we are already defragged the unit of work to something manageable, and therefore CDE autoscaling should kick in and perform well. Once the job spec is created, CDF-X calls CDE over the rest sending over the job spec. CDE accepts the job spec and arguments (file locations) and runs the micro workloads. Each workload has its own heterogeneous compute profile and auto-scales independently.
Wrapping Up
Defragmentation of large singleton payloads enables autoscaling to run more efficiently. Autoscale is an incredibly powerful capability often misused by not applying sound design principles. Leveraging these simple patterns allow for ease of operations, cost control, and manageability.
... View more
08-27-2021
02:24 PM
2 Kudos
Recently I was tasked to build a possible pattern regarding how to handle slowing changing dimensions (Type 2 to be specific) within CDP. The obvious answer is to use Hive ACID, but clearly, that answer is too generic. I needed to build a pipeline similar to what I used to do as an Informatica developer and verify legitimacy to the solution/pattern. Here we go.
Why?
The objective was to build a possible (one of many) pattern on how to handle SCD Type 2 dimensions with CDP. Outcome: I was able to repeat a typical ETL workflow with CDP. How? By taking off my horse blinders...Really. What are SCDs?
For those who may not be family with slowly changing dimensions within a EDW context, here is a great quick read: Types Of Dimension Tables
How?
Data lands in a sourcing area
Super typical
Pull source data & run it through a cleaning/processing layer for staging
Ready for merging against SCD table
Run an ACID merge between stage and SCD Table
Not rocket science
I don't believe you...I wanna try this in CDP Public Cloud or Private Cloud
Perfect. This article runs through a demo using CDP to do exactly that.
What is required for this demo?
CDP Public Cloud Account
Cloudera Data Engineering...which includes airflow
Cloudera Data Warehouse (HiveLLAP)
Assets required to produce this demo:
Spark Code
https://github.com/sunileman/spark-CDE-SCD
Airflow Dag
https://github.com/sunileman/airflow-scd-dag
Data
products.psv (Type 2 dim)
product_changes.psv (changes that need to be applied to the product dimension table)
Workflow Details
Airflow will orchestrate the workflow. First, a CDE Spark job will pick up the product changes from s3. Spark will perform the required ETL and then write the output to a staging table (Hive External). Lastly, using the new Airflow CDW operator; a Hive ACID merge will be executed between the staging table and product dimension. Not rocket science. I know.
Data/Tables Setup
For this demo, there is a single dimension table; product. The other table is product_ext, which is an external table with raw data which needs to be applied to the product dimension. Very common stuff.
Product DDL
Note: Hive 3 tables are by default internal full acid support
CREATE TABLE IF NOT EXISTS product(
product_id INT,
product_name STRING,
aisle_id int,
department_id int,
start_date date,
end_date date,
is_current string DEFAULT 'Y')
Product_ext Schema
Note: Replace <YOUR-S3-BUCKET> with your s3 bucket.
CREATE EXTERNAL TABLE IF NOT EXISTS product_ext(
product_id INT,
product_name STRING,
aisle_id int,
department_id int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3a://<YOUR-S3-BUCKET>/sunman/products/'
tblproperties ("skip.header.line.count"="1");
Load the Production dimension table
insert into product
select product_id, product_name, aisle_id, department_id, current_date() as start_date, null as end_date, 'Y' as is_current from product_ext
Lastly, upload the product_changes.psv file to s3
Note: Replace <YOUR-S3-BUCKET> with your s3 bucket
s3a://<YOUR-S3-BUCKET>/sunman/product_changes/
Recap: A product dimension table has been created. A file with changes that need to be applied against the product dimension table has been uploaded to s3.
Cloudera Data Warehousing
Via CDW, a Hive ACID merge statement will merge the staging table with the product dimension. This will be triggered by Airflow using the new CDW operator. More on that later.
Grab the JDBC URL from the virtual warehouse
Note: SSO must be disabled
For example:
jdbc:hive2://zzz.cloudera.site/default;transportMode=http;httpPath=cliservice;socketTimeout=60;ssl=true;retries=3;
Create an Airflow CDW Connection
To execute a CDW HQL statement(s), an airflow connection to CDW is required. The connection is referenced in the airflow dag, more on that later.
How to create a CDW Airflow connection: Automating data pipelines using Apache Airflow in Cloudera Data Engineering.
Important: Make note of the conn Id. It will be used later in this article
Create a CDE Spark Job
This job reads product_changes.psv file (contains the changes which need to be applied to the product dimension), performs cleansing/ETL, and stages the changes as an external Hive table.
The code is available on my Github page. If you're not interested in building the code, no worries. I have also provided the Spark jar which can be downloaded instead.
https://github.com/sunileman/spark-CDE-SCD/blob/master/target/scala-2.11/spark-scd_2.11-0.1.jar
CDE Spark Job Details
Note: For this demo to work, all job specs/args/configs must match the screen shot below
Note: Update <YOUR-S3-BUCKET> with your s3 bucket
Create a CDE Airflow Job
The airflow dag flow is the following:
First, a Spark CDE job will be called to stage the product changes into an external Hive table. Then CDW will be called to perform the Hive ACID Merge between the product dimension and staging table. The code for the dag is here:
https://github.com/sunileman/airflow-scd-dag
The dag file is airflow-scd.py
Open it and update the cli_conn_id. This is the Airflow CDW connection created earlier:
##https://docs.cloudera.com/data-engineering/cloud/manage-jobs/topics/cde-airflow-dag-pipeline.html
hive_cdw_job = CDWOperator(
task_id='cdw_hive_job',
dag=dag,
cli_conn_id='<YOUR-AIRFLOW-CDW-CONNECTION-ID>',
hql=hive_merge,
schema='default',
use_proxy_user=False,
query_isolation=False
)
Run the SCD workflow
Generally, airflow jobs can be executed through the UI. Since I parametrized the dag, at this time the only way to execute airflow job with run time configs is through CLI.
Download CDE CLI: Using the Cloudera Data Engineering command line interface
Note: Update <YOUR-S3-BUCKET> with your s3 bucket name
./cde job run --config c_stageTable='product_staged' --config c_sourceLoc='s3a://<YOUR-S3-BUCKET>/sunman/product_changes/' --config c_stageCleansedTable=product_staged_cleansed --config c_dimTable=product --name EDW-SCD-WorkFlow
This will return an airflow job ID. Return to the CDE UI. After the dag completes, run the following:
SELECT * FROM PRODUCT WHERE PRODUCT_ID = 232
Product ID 232 is one of several products, which was updated.
Wrap It Up!
This end-to-end SCD pipeline demonstrated the capability to use the right tool for the right job to accomplish a highly important EDW task. All driven by using Airflow's super powerful orchestration engine to hand off work between Apache Spark and Apache Hive. Spark for data processing. Hive for EDW (Acid Merge).
... View more
01-20-2021
12:55 PM
2 Kudos
Credits to @mbawa (Mandeep Singh Bawa) who co-built all the assets in this article. Thank you! We (Mandeep and I) engaged on a customer use case where Cloudera Data Engineering (Spark) jobs were triggered once a file lands in S3 (details on how to trigger CDE from Lambda here). Triggering CDE jobs is quite simple; however, we needed much more. Here are a few of the requirements: Decoupling Ingestion Layer / Processing Layer Decoupling apps (sender) from Spark Apps can send and forget payloads without the burden of configuring Spark (#number of executors, memory/cpu, etc), the concern of Spark availability (Upgrades, resources availability, etc), or application impacts from CDE API changes Real-time changes to where CDE jobs are sent (Multi CDE) Monitor job status and alerts Monitoring job run times and alerts which may be out-of-spec runtimes Failover to Secondary CDE Throttling Authentication It may look as though we are trying to make NiFi into an orchestration engine for CDE. That's not the case. Here we are trying to fill some core objectives and leveraging capabilities within the platform to accomplish the above-stated task. CDE comes with Apache Airflow, a much richer orchestration engine. Here we are integrating AWS triggers, multiple CDE clusters, monitoring, alerting, and single API for multi clusters. Artifacts NiFi CDE Jobs Pipeline Workflow Streams Messaging Cluster (Kafka) CDF clusters (NiFi) Heavy usage of NiFi parameters High-Level WorkFlow At a high level, the NiFi workflow does the following: Exposes a single rest endpoint for CDE job submission CDE workload balancing between multiple CDE clusters If only a single CDE cluster is available, it will queue jobs until compute bandwidth is available Queue jobs if CDE clusters are too busy Jobs will re-run if set in the queue If the number of retry for a job spec is greater than 3 (parameterized), an alert will be triggered Monitor jobs from start to finish Alert if job Fails Run time out of predetermined max run time i.e. jobs run for 10 minutes and max run time for jobs is set to 5 minutes Setup The following NiFi parameters will be required api_token (CDE Token, more on this later) Set to ${cdeToken} job-runtime-threshold-ms Max run time a job should run before an alert is triggered kbrokers Kafka brokers ktopic-fail Kafka topic: cde-job-failures ktopic-inbound-jobs Kafka topic: cde-jobs ktopic-job-monitoring Kafka topic: cde-job-monitoring ktopic-job-runtime-over-limit Kafka topic: cde-job-runtime-alert ktopic-retry Kafka topic: cde-retry username CDE Machine user password CDE machine user password primary-vc-token-api CDE token api (more on this later) primary_vc_jobs_api CDE Primary cluster jobs api (more on this later) secondary-vc-available Y/N If secondary CDE cluster is available, set to Y, else N secondary_vc_jobs_api CDE secondary cluster jobs API if the secondary cluster is available run_count_limit Max number of concurrent running jobs per CDE cluster i.e. 20 wait-count-max Max retry count. If a job is unable to be submitted to CDE (ie due to be too busy), how many times should NiFi retry before adding job to Kafka ktopic-fail topic i.e. 5 start_count_limit Max number of concurrent starting jobs per CDE cluster i.e. 20 Note: When you run the workflow for the first time, generally the Kafka topics will be automatically created for you. Detailed WorkFlow Once a CDE job spec is sent to NiFi, NiFi does the following: Write job spec to Kafka ktopic-inbound-jobs (nifi parameter) topic Pull jobs from Kafka ktopic-inbound-jobs (nifi parameter) topic New jobs- ktopic-inbound-jobs (nifi parameter) topic retry jobs- ktopic-retry (nifi parameter) topic Monitoring jobs- ktopic-job-monitoring (nifi parameter) topic Fetch CDE API tokens Check if the primary cluster current run count is less than run_count_limit (nifi parameter) Check if the primary cluster current starting count is less than start_count_limit (nifi parameter) If run or start counts are not within limit, retry the same logic on the secondary cluster (if available, secondary-vc-available) If run/start counts are within limit, job spec will be submitted to CDE If run/start counts are not within limit for primary and secondary CDE and the number of retries is less than wait-count-max (nifi parameter), job spec will be written to a Kafka ktopic-retry topic (nifi parameter) Monitoring NiFi will call CDE to determine the current status of Job ID (pulled from ktopic-job-monitoring) If the job end is successful, nothing more will happen here. If the job ends with failure, job spec will be written to Kafka ktopic-fail topic If the job is running and run time is less than job-runtime-threshold-ms Write job spec to ktopic-job-monitoring Else send an alert (nifi parameter) CDE APIs To get started, CDE primary and secondary (if available) cluster API details are needed in NiFi as parameters: To fetch the token API, click the pencil icon: Click on Grafana URL: The URL will look something like this: https://service.cde-zzzzzz.moad-aw.aaaaa-aaaa.cloudera.site/grafana/d/sK1XDusZz/kubernetes?orgId=1&refresh=5s Set the NiFi parameter primary-vc-token-api to the first part of the URL: service.cde-zzzzzz.moad-aw.aaaaa-aaaa.cloudera.site Now get the Jobs API for both primary and secondary (if available). For a virtual cluster, Click the pencil icon Click Jobs API URL to copy the URL The jobs URL will look something like this: https://aaa.cde-aaa.moad-aw.aaa-aaa.cloudera.site/dex/api/v1 Fetch the first part of the URL and set the NiFi parameter primary_vc_jobs_api. Do the same steps for secondary_vc_jobs_api aaa.cde-aaa.moad-aw.aaa-aaa.cloudera.site Run a CDE job Inside of the NiFi workflow, there is a test flow to verify the NiFi CDE jobs pipeline works: To run the flow, inside of InvokeHTTP, set the URL to one of the NiFi nodes. Run it and if the integration is working successfully; you will see a job running in CDE. Enjoy! Oh, by the way, I plan on publishing a video walking through the NiFi flow.
... View more
11-09-2020
01:34 PM
1 Kudo
Recently I ran into a scenario requiring to connect my Spark Intellij IDE to Kafka DataHub. I'm not going to claim the status of a pro at IDE secure setup. Therefore for novices in the security realm alike, they may find this article useful This article will go through steps setting up an Spark Scala IDE (Intellij) (with a supplied working code example) to connect securely to a Kafka DataHub over SASL_SSL protocol using PLAIN SASL mechanism. Artifacts https://github.com/sunileman/spark-kafka-streaming Scala Object https://github.com/sunileman/spark-kafka-streaming/blob/master/src/main/scala/KafkaSecureStreamSimpleLocalExample.scala The scala object accepts 2 inputs Target Kafka topic Kafka broker(s) Prequequites Kafka DataHub Instances Permission setup on Ranger to be able to read/write from Kafka Intellij (or similar) with the Scala plugin installed Workload username and password TrustStore Andre Sousa Dantas De Araujo did a great job explaining (very simply) how get the certificate from CDP and create a truststore. Just a few simple steps here https://github.com/asdaraujo/cdp-examples#tls-truststore I stored it here on my local machine which is referenced in the spark scala code ./src/main/resources/truststore.jks JaaS Setup Create a jaas.conf file KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="YOUR-WORKLOAD-USER"
password="YOUR-WORKLOAD-PASSWORD";
}; I stored mine here which is referenced in the spark scala code ./src/main/resources/jaas.conf Spark Session (Scala Code) Master is set to local set spark.driver.extraJavaOptions and spark.executor.extraJavaOptions to the location of your jaas.conf set spark.kafka.ssl.truststore.location to the location of your truststore val spark = SparkSession.builder
.appName("Spark Kafka Secure Structured Streaming Example")
.master("local")
.config("spark.kafka.bootstrap.servers", kbrokers)
.config("spark.kafka.sasl.kerberos.service.name", "kafka")
.config("spark.kafka.security.protocol", "SASL_SSL")
.config("kafka.sasl.mechanism", "PLAIN")
.config("spark.driver.extraJavaOptions", "-Djava.security.auth.login.config=./src/main/resources/jaas.conf")
.config("spark.executor.extraJavaOptions", "-Djava.security.auth.login.config=./src/main/resources/jaas.conf")
.config("spark.kafka.ssl.truststore.location", "./src/main/resources/truststore.jks")
.getOrCreate() Write to Kafka The data in the dataframe is hydrated via csv file. Here I will simply read the dataframe and write it back out to a Kafka topic val ds = streamingDataFrame.selectExpr("CAST(id AS STRING)", "CAST(text AS STRING) as value")
.writeStream.format("kafka")
.outputMode("update")
.option("kafka.bootstrap.servers", kbrokers)
.option("topic", ktargettopic)
.option("kafka.sasl.kerberos.service.name", "kafka")
.option("kafka.ssl.truststore.location", "./src/main/resources/truststore.jks")
.option("kafka.security.protocol", "SASL_SSL")
.option("kafka.sasl.mechanism", "PLAIN")
.option("checkpointLocation", "/tmp/spark-checkpoint2/")
.start()
.awaitTermination() Run Supply JVM option, provide the location of the jaas.conf -Djava.security.auth.login.config=/PATH-TO-YOUR-jaas.conf Supply the program arguments. My code takes 2, kafka topic and Kafka broker(s) sunman my-kafka-broker:9093 That's it! Run it and enjoy secure SparkStreaming+Kafka glory
... View more
11-03-2020
12:41 AM
Hi, went through my testing again. Unfortunately, I missed the step where I need to change/add the parameters on my commandline. Change my TeraGen, TeraSort and TeraValidate parameter and got better results TeraGen: 1 min 57 sec TeraSort: 22min 55sec TeraValidate: 1 min 23sec. Thank you very much for your writeup again.
... View more
10-12-2020
10:29 AM
@sunile_manjee I am not very familiar with AWS ELB but you can try to use HandleHttpRequest and HandleHttpResponse processors and check if it serves your use case
... View more
09-11-2020
12:47 PM
1 Kudo
Recently I was engaged in a use case where CDE processing was required to be triggered once data landed on s3. The s3 trigger in AWS would be via a Lambda function. As the files/data land in s3, an AWS Lambda function would be triggered to then call CDE to process the data/files. Lambda functions at trigger time include the names and locations of the files the trigger was executed upon. The file locations/names would be passed onto the CDE engine to pick up and process accordingly.
Prerequisites to run this demo
AWS account
s3 Bucket
Some knowledge of Lambda
CDP and CDE
Artifacts
AWS Lambda function code
https://github.com/sunileman/spark-kafka-streaming/blob/master/src/main/awslambda/triggerCDE.py
CDE Spark Job, main class com.cloudera.examples.SimpleCDERun
Code for class com.cloudera.examples.SimpleCDERun
https://github.com/sunileman/spark-kafka-streaming
Prebuilt jar
https://sunileman.s3.amazonaws.com/CDE/spark-kafka-streaming_2.11-1.0.jar
Processing Steps
Create a CDE Job (Jar provided above)
Create a Lambda function on an s3 bucket (Code provided above)
Trigger on put/post
Load a file or files on s3 (any file)
AWS Lambda is triggered by this event which calls CDE. The call to CDE will include the locations and names of all files the trigger was executed upon
CDE will launch, processing the files, and end gracefully
It's quite simple.
Create a CDE Job
Name: Any Name. I called it testjob
Spark Application: Jar file provided above
Main Class: com.cloudera.examples.SimpleCDERun
Lambda
Create an AWS Lambda function to trigger on put/post for s3. The lambda function code is simple. It will call CDE for each file posted to s3. Lambda function provided in the artifacts section above.
The following are the s3 properties:
Trigger CDE
Upload a file to s3. Lambda will trigger the CDE job. For example, I uploaded a file test.csv to s3. Once the file was uploaded, Lambda calls CDE to execute a job on that file
Lambda Log
The first arrow shows the file name (test.csv). The second arrow shows the CDE JobID, which in this case returned the number 14.
In CDE, Job Run ID: 14
In CDE stdout logs show that the job received the location and name of the file which Lambda was triggered upon.
As I said in my last post, CDE is making things super simple. Enjoy.
... View more
Labels:
08-28-2020
09:44 AM
2 Kudos
The all new Cloudera Data Engineering ExperienceThe all new Cloudera Data Engineering Experience I recently had the opportunity to work with Cloudera Data Engineering to stream data from Kafka. It's quite interesting how I was able to deploy code without much worry about how to configure the back end components. Demonstration This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter". Spark Streaming on Cloudera Data Engineering Experience CDE will pull from the twitter topic, extract the text field from the payload (which is the tweet itself) and write back to another Kafka topic named "tweet" The following is an example of a twitter payload. The objective is to extract only the text field: What is Cloudera Data Engineering? Cloudera Data Engineering (CDE) is a serverless service for Cloudera Data Platform that allows you to submit Spark jobs to an auto-scaling cluster. CDE enables you to spend more time on your applications, and less time on infrastructure. How do I begin with Cloudera Data Engineering? Complete setup instructions here. Prerequisites Access to a CDE Some understanding of Apache Spark Access to a Kafka cluster In this demo, I use Cloudera DataHub, Streamings Messaging for rapid deployment of a Kafka Cluster on AWS An IDE I use Intellij I do provide the jar later on in this article Twitter API developer Access: https://developer.twitter.com/en/portal/dashboard Setting up a twitter stream I use Apache NiFi deployed via Cloudera DataHub on AWS Source Code I posted all my source code here. If you're not interested in building the jar, that's fine. I’ve made the job Jar available here. Oc t26, 2020 update - I added source code for how to connect CDE to Kafka DH available here. Users should be able to run the code as is without need for jaas or keytab. Kafka Setup This article is focused on Spark Structured Streaming with CDE. I'll be super brief here Create two Kafka topics twitter This topic is used to ingest the firehose data from twitter API tweet This topic is used post tweet extraction performed via Spark Structured streaming NiFi Setup This article is focused on Spark Structured Streaming with CDE. I'll be super brief here. Use the GetTwitter processor (which requires twitter api developer account, free) and write to the Kafka twitter topic Spark Code (Scala) Load up the Spark code on your machine from here: https://github.com/sunileman/spark-kafka-streaming Fire off a sbt clean and package A new jar will be available under target: spark-kafka-streaming_2.11-1.0.jar The jar is available here What does the code do? It will pull from the source Kafka topic (twitter), extract the text value from the payload (which is the tweet itself) and write to the target topic (tweet) CDE Assuming CDE access is available, navigate to virtual clusters->View Jobs Click on Create Job: Job Details Name Job Name Spark Application File This is the jar created from the sbt package: spark-kafka-streaming_2.11-1.0.jar Another option is to simply provide the URL where the jar available https://sunileman.s3.amazonaws.com/CDE/spark-kafka-streaming_2.11-1.0.jar Main Class com.cloudera.examples.KafkaStreamExample Arguments arg1 Source Kafka topic: twitter arg2 Target Kafka topic: tweet arg3 Kafka brokers: kafka1:9092,kafka2:9092,kafka3:9092 From here jobs can be created and run or simply created. Click on Create and Run to view the job run: To view the metrics about the streaming: At this point, only the text (tweet) from the twitter payload is being written to the tweet Kafka topic. That's it! You now have a spark structure stream running on CDE fully autoscaled. Enjoy
... View more
08-14-2020
06:37 AM
@DivyaKaki The exception implies that the complete trust chain does not exist to facilitate a successful mutual TLS handshake between this NiFI and the target NiFi-Registry. NiFi uses the keystore and truststore configured in its nifi.properties and NiFi-Registry uses the keystore and truststore configured in its nifi-registry.properties files. Openssl can be used to public certificates for the complete trust chain: openssl s_client -connect <nifi-registry-hostname>:<port> -showcerts openssl s_client -connect <nifi-hostname>:<port> -showcerts for each public cert you will see: -----BEGIN CERTIFICATE-----
MIIESjCCAzKgAwIBAgINAeO0mqGNiqmBJWlQuDANBgkqhkiG9w0BAQsFADBMMSAw
HgYDVQQLExdHbG9iYWxTaWduIFJvb3QgQ0EgLSBSMjETMBEGA1UEChMKR2xvYmFs
U2lnbjETMBEGA1UEAxMKR2xvYmFsU2lnbjAeFw0xNzA2MTUwMDAwNDJaFw0yMTEy
MTUwMDAwNDJaMEIxCzAJBgNVBAYTAlVTMR4wHAYDVQQKExVHb29nbGUgVHJ1c3Qg
U2VydmljZXMxEzARBgNVBAMTCkdUUyBDQSAxTzEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQDQGM9F1IvN05zkQO9+tN1pIRvJzzyOTHW5DzEZhD2ePCnv
UA0Qk28FgICfKqC9EksC4T2fWBYk/jCfC3R3VZMdS/dN4ZKCEPZRrAzDsiKUDzRr
mBBJ5wudgzndIMYcLe/RGGFl5yODIKgjEv/SJH/UL+dEaltN11BmsK+eQmMF++Ac
xGNhr59qM/9il71I2dN8FGfcddwuaej4bXhp0LcQBbjxMcI7JP0aM3T4I+DsaxmK
FsbjzaTNC9uzpFlgOIg7rR25xoynUxv8vNmkq7zdPGHXkxWY7oG9j+JkRyBABk7X
rJfoucBZEqFJJSPk7XA0LKW0Y3z5oz2D0c1tJKwHAgMBAAGjggEzMIIBLzAOBgNV
HQ8BAf8EBAMCAYYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMBIGA1Ud
EwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFJjR+G4Q68+b7GCfGJAboOt9Cf0rMB8G
A1UdIwQYMBaAFJviB1dnHB7AagbeWbSaLd/cGYYuMDUGCCsGAQUFBwEBBCkwJzAl
BggrBgEFBQcwAYYZaHR0cDovL29jc3AucGtpLmdvb2cvZ3NyMjAyBgNVHR8EKzAp
MCegJaAjhiFodHRwOi8vY3JsLnBraS5nb29nL2dzcjIvZ3NyMi5jcmwwPwYDVR0g
BDgwNjA0BgZngQwBAgIwKjAoBggrBgEFBQcCARYcaHR0cHM6Ly9wa2kuZ29vZy9y
ZXBvc2l0b3J5LzANBgkqhkiG9w0BAQsFAAOCAQEAGoA+Nnn78y6pRjd9XlQWNa7H
TgiZ/r3RNGkmUmYHPQq6Scti9PEajvwRT2iWTHQr02fesqOqBY2ETUwgZQ+lltoN
FvhsO9tvBCOIazpswWC9aJ9xju4tWDQH8NVU6YZZ/XteDSGU9YzJqPjY8q3MDxrz
mqepBCf5o8mw/wJ4a2G6xzUr6Fb6T8McDO22PLRL6u3M4Tzs3A2M1j6bykJYi8wW
IRdAvKLWZu/axBVbzYmqmwkm5zLSDW5nIAJbELCQCZwMH56t2Dvqofxs6BBcCFIZ
USpxu6x6td0V7SvJCCosirSmIatj/9dSSVDQibet8q/7UK4v4ZUN80atnZz1yg==
-----END CERTIFICATE----- Above is just example public cert from openssl command against google.com:443 You will need to make sure that every certificate in the chain when run agains NiFi UI is added to the truststore on NiFi-Registry and vice versa. You'll need to restart NiFi and NiFi-Registry before changes to your keystore or truststore files will be read in. Hope this helps, Matt
... View more