Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4018 | 08-20-2018 08:26 PM | |
| 1924 | 08-15-2018 01:59 PM | |
| 2356 | 08-13-2018 02:20 PM | |
| 4067 | 07-23-2018 04:37 PM | |
| 4972 | 07-19-2018 12:52 PM |
08-28-2020
09:44 AM
2 Kudos
The all new Cloudera Data Engineering Experience I recently had the opportunity to work with Cloudera Data Engineering to stream data from Kafka. It's quite interesting how I was able to deploy code without much worry about how to configure the back end components. Demonstration This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter". Spark Streaming on Cloudera Data Engineering Experience CDE will pull from the twitter topic, extract the text field from the payload (which is the tweet itself) and write back to another Kafka topic named "tweet" The following is an example of a twitter payload. The objective is to extract only the text field: What is Cloudera Data Engineering? Cloudera Data Engineering (CDE) is a serverless service for Cloudera Data Platform that allows you to submit Spark jobs to an auto-scaling cluster. CDE enables you to spend more time on your applications, and less time on infrastructure. How do I begin with Cloudera Data Engineering (CDE)? Complete setup instructions here. Prerequisites Access to a CDE Some understanding of Apache Spark Access to a Kafka cluster In this demo, I use Cloudera DataHub, Streamings Messaging for rapid deployment of a Kafka Cluster on AWS An IDE I use Intellij I do provide the jar later on in this article Twitter API developer Access: https://developer.twitter.com/en/portal/dashboard Setting up a twitter stream I use Apache NiFi deployed via Cloudera DataHub on AWS Source Code I posted all my source code here. If you're not interested in building the jar, that's fine. I’ve made the job Jar available here. Oc t26, 2020 update - I added source code for how to connect CDE to Kafka DH available here. Users should be able to run the code as is without need for jaas or keytab. Kafka Setup This article is focused on Spark Structured Streaming with CDE. I'll be super brief here Create two Kafka topics twitter This topic is used to ingest the firehose data from twitter API tweet This topic is used post tweet extraction performed via Spark Structured streaming NiFi Setup This article is focused on Spark Structured Streaming with CDE. I'll be super brief here. Use the GetTwitter processor (which requires twitter api developer account, free) and write to the Kafka twitter topic Spark Code (Scala) Load up the Spark code on your machine from here: https://github.com/sunileman/spark-kafka-streaming Fire off a sbt clean and package A new jar will be available under target: spark-kafka-streaming_2.11-1.0.jar The jar is available here What does the code do? It will pull from the source Kafka topic (twitter), extract the text value from the payload (which is the tweet itself) and write to the target topic (tweet) CDE Assuming CDE access is available, navigate to virtual clusters->View Jobs Click on Create Job: Job Details Name Job Name Spark Application File This is the jar created from the sbt package: spark-kafka-streaming_2.11-1.0.jar Another option is to simply provide the URL where the jar available https://sunileman.s3.amazonaws.com/CDE/spark-kafka-streaming_2.11-1.0.jar Main Class com.cloudera.examples.KafkaStreamExample Arguments arg1 Source Kafka topic: twitter arg2 Target Kafka topic: tweet arg3 Kafka brokers: kafka1:9092,kafka2:9092,kafka3:9092 From here jobs can be created and run or simply created. Click on Create and Run to view the job run: To view the metrics about the streaming: At this point, only the text (tweet) from the twitter payload is being written to the tweet Kafka topic. That's it! You now have a spark structure stream running on CDE fully autoscaled. Enjoy
... View more
08-14-2020
06:37 AM
@DivyaKaki The exception implies that the complete trust chain does not exist to facilitate a successful mutual TLS handshake between this NiFI and the target NiFi-Registry. NiFi uses the keystore and truststore configured in its nifi.properties and NiFi-Registry uses the keystore and truststore configured in its nifi-registry.properties files. Openssl can be used to public certificates for the complete trust chain: openssl s_client -connect <nifi-registry-hostname>:<port> -showcerts openssl s_client -connect <nifi-hostname>:<port> -showcerts for each public cert you will see: -----BEGIN CERTIFICATE-----
MIIESjCCAzKgAwIBAgINAeO0mqGNiqmBJWlQuDANBgkqhkiG9w0BAQsFADBMMSAw
HgYDVQQLExdHbG9iYWxTaWduIFJvb3QgQ0EgLSBSMjETMBEGA1UEChMKR2xvYmFs
U2lnbjETMBEGA1UEAxMKR2xvYmFsU2lnbjAeFw0xNzA2MTUwMDAwNDJaFw0yMTEy
MTUwMDAwNDJaMEIxCzAJBgNVBAYTAlVTMR4wHAYDVQQKExVHb29nbGUgVHJ1c3Qg
U2VydmljZXMxEzARBgNVBAMTCkdUUyBDQSAxTzEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQDQGM9F1IvN05zkQO9+tN1pIRvJzzyOTHW5DzEZhD2ePCnv
UA0Qk28FgICfKqC9EksC4T2fWBYk/jCfC3R3VZMdS/dN4ZKCEPZRrAzDsiKUDzRr
mBBJ5wudgzndIMYcLe/RGGFl5yODIKgjEv/SJH/UL+dEaltN11BmsK+eQmMF++Ac
xGNhr59qM/9il71I2dN8FGfcddwuaej4bXhp0LcQBbjxMcI7JP0aM3T4I+DsaxmK
FsbjzaTNC9uzpFlgOIg7rR25xoynUxv8vNmkq7zdPGHXkxWY7oG9j+JkRyBABk7X
rJfoucBZEqFJJSPk7XA0LKW0Y3z5oz2D0c1tJKwHAgMBAAGjggEzMIIBLzAOBgNV
HQ8BAf8EBAMCAYYwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMBIGA1Ud
EwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFJjR+G4Q68+b7GCfGJAboOt9Cf0rMB8G
A1UdIwQYMBaAFJviB1dnHB7AagbeWbSaLd/cGYYuMDUGCCsGAQUFBwEBBCkwJzAl
BggrBgEFBQcwAYYZaHR0cDovL29jc3AucGtpLmdvb2cvZ3NyMjAyBgNVHR8EKzAp
MCegJaAjhiFodHRwOi8vY3JsLnBraS5nb29nL2dzcjIvZ3NyMi5jcmwwPwYDVR0g
BDgwNjA0BgZngQwBAgIwKjAoBggrBgEFBQcCARYcaHR0cHM6Ly9wa2kuZ29vZy9y
ZXBvc2l0b3J5LzANBgkqhkiG9w0BAQsFAAOCAQEAGoA+Nnn78y6pRjd9XlQWNa7H
TgiZ/r3RNGkmUmYHPQq6Scti9PEajvwRT2iWTHQr02fesqOqBY2ETUwgZQ+lltoN
FvhsO9tvBCOIazpswWC9aJ9xju4tWDQH8NVU6YZZ/XteDSGU9YzJqPjY8q3MDxrz
mqepBCf5o8mw/wJ4a2G6xzUr6Fb6T8McDO22PLRL6u3M4Tzs3A2M1j6bykJYi8wW
IRdAvKLWZu/axBVbzYmqmwkm5zLSDW5nIAJbELCQCZwMH56t2Dvqofxs6BBcCFIZ
USpxu6x6td0V7SvJCCosirSmIatj/9dSSVDQibet8q/7UK4v4ZUN80atnZz1yg==
-----END CERTIFICATE----- Above is just example public cert from openssl command against google.com:443 You will need to make sure that every certificate in the chain when run agains NiFi UI is added to the truststore on NiFi-Registry and vice versa. You'll need to restart NiFi and NiFi-Registry before changes to your keystore or truststore files will be read in. Hope this helps, Matt
... View more
08-11-2020
01:40 PM
2 Kudos
Image Courtesy: k9s
I recently ran into a scenario where I needed to gather Hive logs on the new Data Warehouse Experience on AWS. The "old" way of fetching logs was to SSH into the nodes. Data Warehouse Experience is now deployed on K8s, so SSHing is off the table. Therefore a tool like K9s is key. This is a raw article to quickly demonstrate how to use K9s to fetch Data Warehouse Experience logs which are deployed on AWS K8s
Prerequisites
Data Warehouse Experience
K9s installed on your machine
AWS ARN (instructions provided below)
AWS configure (CLI) pointing to your AWS env. Simply type AWS configure via CLI and point to the correct AWS subscription
AWS ARN
Your AWS ARN is required to successfully connect K9s to CDW(DW-X)
On AWS, go to IAM > Users > Search for your user name:
Click on your username to fetch the ARN:
Kubeconfig
Connecting to DW-X using K9s requires kubeconfig. DW-X makes this available under DW-X-> Environments > Your Environment > Show Kubeconfig.
Click on the copy option and make the contents available within a file in your machine file system. For example, I stored the kubeconfig contents here: /Users/sunile.manjee/.k9s/kubeconfig.yml
ARN
To access K8s from K9s, your ARN will need to be added under Grant Access:
K9s
Now all is set up to connect to DW-X K8s using K9s. Reference kubeconfig.yml file when using K9s
k9s --kubeconfig /Users/sunile.manjee/.k9s/kubeconfig.yml
That's it. From here the logs are made available and a ton of other metrics. For more information on how to use K9s, see k9scli.io
... View more
Labels:
06-05-2020
11:17 AM
@LearnerAdmin It is not clear to me what you are asking when you say "add NIFI CA in authorities". Instructions on using the NiFi TLS toolkit can be found here: https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#tls_toolkit Using the Client/Server Tls Toolkit operational mode covered here: https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#client-server Will give you the ability to create a running NiFi CA authority "server" which will sign your NiFi node certificates created using the "client" mode. Thanks, Matt
... View more
06-04-2020
08:28 AM
Probably worth pointing out that the behaviour of insertInto & saveAsTable can differ under certain conditions: https://towardsdatascience.com/understanding-the-spark-insertinto-function-1870175c3ee9 https://stackoverflow.com/questions/47844808/what-are-the-differences-between-saveastable-and-insertinto-in-different-savemod
... View more
05-17-2020
08:41 PM
Hi @kettle
As this thread was marked 'Solved' in June of 2016 you would have a better chance of receiving a useful response by starting a new thread. This will also provide you with the opportunity to provide details specific to your use of the PutSQL processor and/or Phoenix that could aid others in providing a more tailored answer to your question.
... View more
05-15-2020
07:15 PM
hello! If I insert a string containing 'or "or, PutSQL to Phoenix will be return the grammatical errors, this should be how to solve?
... View more
05-06-2020
09:59 AM
1 Kudo
@rahulsharma,
The View solution in the original post option is more helpful when a discussion goes beyond one or two pages. For example, if somebody marks one of the posts as a solution in Page 2, clicking on View solution in the original post will bring you back to the first page under the original question.
We understand that this can be confusing. Hopefully, this explanation should help.
Regards,
Vidya
... View more
05-04-2020
10:38 AM
1 Kudo
The EFM (Edge Flow Manager) makes it super simple to write flows for MiNiFi to execute where ever it may be located (laptops, refineries, phones, OpenShift,etc). All agents (MiNiFi) are assigned an agentClass. Once the agent is turned on, it will phone home to EFM for run-time instructions. The run-time instructions are set at the Class level. Meaning all agents within a class, run the same instruction (flow) set. There can be 0 to many Classes. In this example, I will capture Windows Security Events via MiNiFi and ship them to NiFi over Site2Site
Download MiNiFi MSI and set the classname. In this example, I set the classname to test6. This property is set at install time (MSI) or by going directly into minifi.properties. Also, notice the setting nifi.c2.enable=true. This informs MiNFi that run time flow instructions will be received from EFM. Start MiNiFi.
MiNiFi can be configured to send data to multi endpoint (ie Kafka, NiFi, EventHub, etc). In this example, data will be sent to NiFi over S2S. On NiFi create an input port:
Capture the port ID. This will be used in EFM later on:
On EFM, open class test6. This is where we design the flow for all agents with their class is set to test6:
To capture Windows events via MiNiFi, add ConsumeWindowsEventLog processor to the canvas:
Configure the process to pull events. In this example, MiNiFi will listen for Windows Security Events:
To send data from MiNiFi to NiFi, add Remote Process Group to the canvas. Provide a NiFi endpoint:
Connect ConsumeWindowsEventLog processor to the Remote Process Group. Provide the NiFi Input Port ID captured earlier:
Flow is ready to publish:
Click on Publish. MiNiFi will phone home at a set interval (nifi.c2.agent.heartbeat.period). Once that occurs, MiNiFi will receive new run time flow instructions. At that time data will start flowing into NiFi.
The EFM makes it super simple to capture Windows events and universally ship anywhere without the ball and chain of most agent/platform designs.
... View more
Labels:
04-20-2020
11:25 PM
how do you debug scripts? i use bash -x tpcds-setup.sh,but not find the error,and i use your method but it also report errors
... View more