1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
795 | 04-03-2024 06:39 AM | |
1533 | 01-12-2024 08:19 AM | |
783 | 12-07-2023 01:49 PM | |
1346 | 08-02-2023 07:30 AM | |
1950 | 03-29-2023 01:22 PM |
12-09-2020
10:16 AM
Look at everything that has *Record in the name for anything like CSV, JSON, Parquet, AVRO, Logs. https://www.datainmotion.dev/2020/07/ingesting-all-weather-data-with-apache.html
... View more
12-08-2020
08:41 AM
1 Kudo
QueryRecord processor
... View more
12-03-2020
12:27 PM
2 Kudos
switch to records, never use convertjsontosql
... View more
12-03-2020
12:26 PM
1 Kudo
Use PutHive3Streaming instead from Kafka or PutORC
... View more
12-01-2020
06:12 AM
1 Kudo
The other server needs to have site-site remote connections enabled. https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.5.1/building-a-dataflow/content/configure-site-to-site-server-nifi-instance.html see https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#site_to_site_properties by default this is turned off nifi.remote.input.host nifi.remote.input.http.enabled
... View more
12-01-2020
06:09 AM
Does 0176100c-8d25-196b-1f72-6befa5cab12a match the id for Input from NiFi 1 also right click to make sure you started. you may need to do a refresh. are both clusters or single nodes? Restart both nodes.
... View more
11-30-2020
08:52 AM
check your timeouts turn off or fix any firewalls test any network calls from other machines. could also be the sFTP server you are reading from Connection timed out (Connection timed out); routing to comms.failure: java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out) java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.Connect Up the timeouts for the network calls. How many NIC cards do you have are they 10Gb+? What is the RAM? I recommend 32GB RAM with most to JVM, 30-32 cores. The best practice is to use Cloudera Flow Management with a Cloudera Manager's managed cluster it will make sure everything is running properly. You can also restart them to get a different leading node. Usually when you do sFTP you have only one node making the calls, so that's why that one will get timeout errors calling that SFTP server. Make the timeout greater, your SFTP may be slow or offline or blocked by firewall/gateway/proxy/linux network
... View more
11-12-2020
11:19 AM
1 Kudo
[FLaNK] Smart Weather Applications with Flink SQL
Sometimes you want to acquire, route, transform, live query, and analyze all the weather data in the United States while those reports happen. With FLaNK, it's a trivial process to do.
From Kafka to Kudu for Any Schema of Any Type of Data - No Code, Two Steps
The Schema Registry has full Swagger-ized Runnable REST API Documentation. Integrate, DevOps, and Migration in a simple script.
Here's your schemas, upload, edit, and compare.
Validating Data Against a Schema With Your Approved Level of Tolerance. You want extra fields allowed, you got it:
Feed that data to beautiful visual applications running in Cloudera Machine Learning.
You like drill-down maps, you got them:
Query your data fast with Apache Hue against Apache Kudu tables through Apache Impala:
Let's ingest all the US weather stations even though they are a zipped directory of a ton of XML files:
Weather Ingest is Easy Automagically!
View All Your Topic Data Enabled by Schema Registry Even in Avro Format:
Reference:
Ingesting all weather data with Apache
Source
Build
Query
Kafka Insert
Schemas
Schemas1
Schemas2
SQL
INSERT INTO weathernj
SELECT `location`, station_id,latitude,longitude,observation_time,weather,
temperature_string, temp_f,temp_c,relative_humidity,wind_string,wind_dir,wind_degrees,wind_mph,
wind_kt, pressure_in,dewpoint_string,dewpoint_f,dewpoint_c
FROM weather
WHERE
`location` is not null and `location` <> 'null' and trim(`location`) <> '' and `location` like '%NJ';
Example Slack Output
12:56
========================================================= http://forecast.weather.gov/images/wtf/small/ovc.png Location Cincinnati/Northern Kentucky International Airport, KY Station KCVG Temperature: 49.0 F (9.4 C) Humdity: 83 Wind East at 3.5 MPH (3 KT) Overcast Dewpoint 44.1 F (6.7 C)Observed at Tue, 27 Oct 2020 11:52:00 -0400---- tracking info ---- UUID: 2cb6bd67-148c-497d-badf-dfffb4906b89 Kafka offset: 0 Kafka Timestamp: 1603818351260 =========================================================
[FLaNK] Smart Weather Websocket Application - Kafka Consumer
This is based on Koji Kawamura's excellent GIST:
As part of my Smart Weather Application, I wanted to display weather information as it arrives on a webpage using web sockets. Koji has an excellent NiFi flow that does it. I tweaked it and add some things since I am not using Zeppelin. I am hosting my webpage with NiFi as well.
We simply supply a webpage that makes a WebSocket connection to NiFi and NiFi keeps a cache in HBase to know what the client is doing. This cache is updated by consuming from Kafka. We can then feed events as they happen to the page.
Here is the JavaScript for the web page interface to WebSockets:
<script>
function sendMessage(type, payload) {
websocket.send(makeMessage(type, payload));
}
function makeMessage(type, payload) {
return JSON.stringify({
'type': type,
'payload': payload
});
}
var wsUri = "ws://edge2ai-1.dim.local:9091/test";
websocket = new WebSocket(wsUri);
websocket.onopen = function(evt) {
sendMessage('publish', {
"message": document.getElementById("kafkamessage")
});
};
websocket.onerror = function(evt) {console.log('ERR', evt)};
websocket.onmessage = function(evt) {
var dataPoints = JSON.parse(evt.data);
var output = document.getElementById("results");
var dataBuffer = "<p>";
for(var i=0;i<dataPoints.length;i++)
{
dataBuffer += " <img src=\"" + dataPoints[i].icon_url_base + dataPoints[i].icon_url_name + "\"> " + dataPoints[i].location +
dataPoints[i].station_id + "@" + dataPoints[i].latitude + ":" +
dataPoints[i].longitude + "@" + dataPoints[i].observation_time +
dataPoints[i].temperature_string + "," + dataPoints[i].relative_humidity + "," +
dataPoints[i].wind_string +"<br>";
}
output.innerHTML = output.innerHTML + dataBuffer + "</p><br>";
};
</script> Video Walkthrough: https://www.twitch.tv/videos/797412192?es_id=bbacb7cb39 Source Code: https://github.com/tspannhw/SmartWeather/tree/main Kafka Topic
weathernj Schema
The schema registry has a live Swagger interface to it's REST API
NiFi Flow Overview
Ingest Via REST All US Weather Data from Zipped XML
As Data Streamings In, We Can Govern It
Ingested Data is Validated Against It's Schema Then Pushed to Kafka as Avro
We consume that Kafka data in-store it in Kudu for analytics
We host a web page for our Websockets Application in NiFi with 4 simple processors.
Listen and Put Web Socket Messages Between NiFi Server and Web Application
Kafka Data is Cached for Websocket Applications
Set the Port for WebSockets via Jetty Web Server
Use HBase As Our Cache
We can monitor our Flink SQL application from the Global Flink Dashboard
We can query our Weather data store in Apache Kudu via Apache Impala through Apache Hue
Kudu Visualizations of Our Weather Data in Cloudera Visual Applications
... View more
Labels:
11-06-2020
09:22 AM
I don't recommend using the spark livy connector anymore. Also by default it's not setup for use for anything other than zeppelin. i would connect via kafka. in nifi you can check the livy controller settings and make sure it's not respawning new ones.
... View more
11-03-2020
12:44 PM
Seems it is not setup properly. You have to have the same hostname as the hostname in your SSL certificate. Caused by: org.springframework.ldap.UncategorizedLdapException: Failed to negotiate TLS session; nested exception is javax.net.ssl.SSLPeerUnverifiedException: hostname of the server '' does not match the hostname in the server's certificate. Check out https://community.cloudera.com/t5/Support-Questions/NIFI-LDAPS-SEEMS-TO-FAIL/td-p/177054 Have you configured: login-identity-providers.xml I would recommend to upgrade to the latest CFM with Apache NiFi 1.11. The Cloudera Manager install process can setup all your SSL properly. You can open a ticket with Cloudera support through the support portal. Some SSL Links https://www.datainmotion.dev/2019/08/find-cacerts-from-java-jre-lib-security.html https://www.datainmotion.dev/2019/09/openssl-ssl-hosting-in-nifi.html
... View more