1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2557 | 04-03-2024 06:39 AM | |
| 3921 | 01-12-2024 08:19 AM | |
| 2122 | 12-07-2023 01:49 PM | |
| 3173 | 08-02-2023 07:30 AM | |
| 4306 | 03-29-2023 01:22 PM |
11-12-2020
11:19 AM
1 Kudo
[FLaNK] Smart Weather Applications with Flink SQL
Sometimes you want to acquire, route, transform, live query, and analyze all the weather data in the United States while those reports happen. With FLaNK, it's a trivial process to do.
From Kafka to Kudu for Any Schema of Any Type of Data - No Code, Two Steps
The Schema Registry has full Swagger-ized Runnable REST API Documentation. Integrate, DevOps, and Migration in a simple script.
Here's your schemas, upload, edit, and compare.
Validating Data Against a Schema With Your Approved Level of Tolerance. You want extra fields allowed, you got it:
Feed that data to beautiful visual applications running in Cloudera Machine Learning.
You like drill-down maps, you got them:
Query your data fast with Apache Hue against Apache Kudu tables through Apache Impala:
Let's ingest all the US weather stations even though they are a zipped directory of a ton of XML files:
Weather Ingest is Easy Automagically!
View All Your Topic Data Enabled by Schema Registry Even in Avro Format:
Reference:
Ingesting all weather data with Apache
Source
Build
Query
Kafka Insert
Schemas
Schemas1
Schemas2
SQL
INSERT INTO weathernj
SELECT `location`, station_id,latitude,longitude,observation_time,weather,
temperature_string, temp_f,temp_c,relative_humidity,wind_string,wind_dir,wind_degrees,wind_mph,
wind_kt, pressure_in,dewpoint_string,dewpoint_f,dewpoint_c
FROM weather
WHERE
`location` is not null and `location` <> 'null' and trim(`location`) <> '' and `location` like '%NJ';
Example Slack Output
12:56
========================================================= http://forecast.weather.gov/images/wtf/small/ovc.png Location Cincinnati/Northern Kentucky International Airport, KY Station KCVG Temperature: 49.0 F (9.4 C) Humdity: 83 Wind East at 3.5 MPH (3 KT) Overcast Dewpoint 44.1 F (6.7 C)Observed at Tue, 27 Oct 2020 11:52:00 -0400---- tracking info ---- UUID: 2cb6bd67-148c-497d-badf-dfffb4906b89 Kafka offset: 0 Kafka Timestamp: 1603818351260 =========================================================
[FLaNK] Smart Weather Websocket Application - Kafka Consumer
This is based on Koji Kawamura's excellent GIST:
As part of my Smart Weather Application, I wanted to display weather information as it arrives on a webpage using web sockets. Koji has an excellent NiFi flow that does it. I tweaked it and add some things since I am not using Zeppelin. I am hosting my webpage with NiFi as well.
We simply supply a webpage that makes a WebSocket connection to NiFi and NiFi keeps a cache in HBase to know what the client is doing. This cache is updated by consuming from Kafka. We can then feed events as they happen to the page.
Here is the JavaScript for the web page interface to WebSockets:
<script>
function sendMessage(type, payload) {
websocket.send(makeMessage(type, payload));
}
function makeMessage(type, payload) {
return JSON.stringify({
'type': type,
'payload': payload
});
}
var wsUri = "ws://edge2ai-1.dim.local:9091/test";
websocket = new WebSocket(wsUri);
websocket.onopen = function(evt) {
sendMessage('publish', {
"message": document.getElementById("kafkamessage")
});
};
websocket.onerror = function(evt) {console.log('ERR', evt)};
websocket.onmessage = function(evt) {
var dataPoints = JSON.parse(evt.data);
var output = document.getElementById("results");
var dataBuffer = "<p>";
for(var i=0;i<dataPoints.length;i++)
{
dataBuffer += " <img src=\"" + dataPoints[i].icon_url_base + dataPoints[i].icon_url_name + "\"> " + dataPoints[i].location +
dataPoints[i].station_id + "@" + dataPoints[i].latitude + ":" +
dataPoints[i].longitude + "@" + dataPoints[i].observation_time +
dataPoints[i].temperature_string + "," + dataPoints[i].relative_humidity + "," +
dataPoints[i].wind_string +"<br>";
}
output.innerHTML = output.innerHTML + dataBuffer + "</p><br>";
};
</script> Video Walkthrough: https://www.twitch.tv/videos/797412192?es_id=bbacb7cb39 Source Code: https://github.com/tspannhw/SmartWeather/tree/main Kafka Topic
weathernj Schema
The schema registry has a live Swagger interface to it's REST API
NiFi Flow Overview
Ingest Via REST All US Weather Data from Zipped XML
As Data Streamings In, We Can Govern It
Ingested Data is Validated Against It's Schema Then Pushed to Kafka as Avro
We consume that Kafka data in-store it in Kudu for analytics
We host a web page for our Websockets Application in NiFi with 4 simple processors.
Listen and Put Web Socket Messages Between NiFi Server and Web Application
Kafka Data is Cached for Websocket Applications
Set the Port for WebSockets via Jetty Web Server
Use HBase As Our Cache
We can monitor our Flink SQL application from the Global Flink Dashboard
We can query our Weather data store in Apache Kudu via Apache Impala through Apache Hue
Kudu Visualizations of Our Weather Data in Cloudera Visual Applications
... View more
Labels:
10-29-2020
04:02 AM
Excellent tutorial !! I've downloaded your ImageProcessor processor and it worked just fine. I see you are using an ExtractImageMetadata processor in the end of the download image flow. Is it another custom processor you have built? If so, can you share the github repo, please? Thank you so much, best regards from Brazil!
... View more
10-22-2020
11:18 AM
Thanks, Tim, my whole Idea is, developers should be able to replay the message from the provenance for at least 5 days as per the requirements I'm assuming the only solution is BUMP up the provenance storage to achieve replay capability. please let me know your thoughts!.
... View more
10-09-2020
07:16 AM
1 Kudo
https://www.datainmotion.dev/2020/09/devops-working-with-parameter-contexts.html download the flow/backup up/store in git copy a flow to archive remove from production https://www.datainmotion.dev/2019/11/nifi-toolkit-cli-for-nifi-110.html
... View more
10-07-2020
02:18 PM
Thanks a lot @TimothySpann for your time and insight. Your advice saves me from future futile efforts to hack/mess up with guava jar libs. I will probably try different approach to load Kafka topics into Hive. Thanks again.
... View more
09-09-2020
09:13 AM
Hi @Debangshu It worked with 1.10.0 and 1.11.3, thanks mate for the resolution. Thanks David
... View more
08-19-2020
10:01 AM
https://www.datainmotion.dev/2020/08/deleting-schemas-from-cloudera-schema.html
... View more
08-06-2020
06:11 AM
That's good news. That's a tricky one. Glad things are working for you.
... View more
07-29-2020
02:27 PM
Thanks, will think on refining the distinction between kudu and druid. Currently i would not want to include the fact that flink has state as 'storage', but regarding flink SQL, i may actually make another post later to talk about the way to interact with/access different kinds of data. (As someone also noticed, impala is also not here because it is not a store in itself, but works with stored data).
... View more
07-21-2020
09:19 AM
1 Kudo
The easiest way to grab monitoring data is via the NiFi REST API. Also everything in the NiFi UI is done through REST calls which you can call programmatically. Please read the NiFi docs they are linked directly from your running NiFi application or on the web. They are very thorough and have all the information you could want: https://nifi.apache.org/docs/nifi-docs/. If you are not running NiFi 1.11.4, I recommend you please upgrade. This is supported by Cloudera on multiple platforms. NiFi Rest API https://nifi.apache.org/docs/nifi-docs/rest-api/ There's also an awesome Python wrapper for that REST API: https://pypi.org/project/nipyapi/ Also in NiFi flow programming, every time you produce data to Kafka you get metadata back in FlowFile Attributes. You can push those attributes directly to a kafka topic if you want. So after your PublishKafkaRecord_2_0 1.11.4 so for success read the attributes on # of record and other data then AttributesToJson and push to another topic. you may want a mergerecord in there to aggregate a few of those together. If you are interested in Kafka metrics/record counts/monitoring then you must use Cloudera Streams Messaging Manager, it provides a full Web UI, Monitoring Tool, Alerts, REST API and everything you need for monitoring every producer, consumer, broker, cluster, topic, message, offset and Kafka component. The best way to get NiFi stats is to use the NiFi Reporting Tasks, I like the SQL Reporting task. SQL Reporting Tasks are very powerful and use standard SELECT * FROM JVM_METRICS style reporting, see my article: https://www.datainmotion.dev/2020/04/sql-reporting-task-for-cloudera-flow.html Monitoring Articles https://www.datainmotion.dev/2019/04/monitoring-number-of-of-flow-files.html https://www.datainmotion.dev/2019/03/apache-nifi-operations-and-monitoring.html Other Resources https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_9.html https://www.datainmotion.dev/2019/08/using-cloudera-streams-messaging.html https://dev.to/tspannhw/apache-nifi-and-nifi-registry-administration-3c92 https://dev.to/tspannhw/using-nifi-cli-to-restore-nifi-flows-from-backups-18p9 https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html https://www.datainmotion.dev/p/links.html https://www.tutorialspoint.com/apache_nifi/apache_nifi_monitoring.htm https://community.cloudera.com/t5/Community-Articles/Building-a-Custom-Apache-NiFi-Operations-Dashboard-Part-1/ta-p/249060 https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-metrics-reporting-nar/1.11.4/org.apache.nifi.metrics.reporting.task.MetricsReportingTask/ https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.reporting.script.ScriptedReportingTask/index.html
... View more