Created on 09-20-2022 10:25 PM - edited 11-03-2022 03:44 PM
In this article, let's build a real-time data visualization to analyze Twitter feeds using Cloudera Data Platform.
Explanation:
Go to CDF user interface, and ensure CDF service is enabled in your CDP environment.
Import the following flow definition - nifi-twitter-flow.json
Select imported flow, click on Deploy, select the Target Environment and begin the deployment process.
During the deployment, it's going to ask about the following parameters that this NiFi Flow requires to function:
It's usually best to delete any historical data from this subdirectory, so you are only staging latest tweets.
Extra Small NiFi node size is enough for this data ingestion.
After deployment is done, you would be able to see the flow in Dashboard.
Go to CDW user interface. Ensure CDW service is activated in your CDP environment, and a Database Catalog & a Virtual Warehouse compute cluster are available for use.
In Hue editor, manually load ISO Language Codes into a table. Default settings in the importer wizard will work fine. If you're not sure how to upload data in Hue, visit Hue Importer -- Select a file, choose a dialect, create a table.
In Hue editor, execute twitter-queries.sql. This will create the necessary tables and views, required to support the visuals in the Twitter Dashboard. Please change AWS S3 location to where you've staged the tweets data.
After the query execution is successful, you will be able to validate tables using queries below.
SELECT * FROM twtr.iso_language_codes a; SELECT * FROM twtr.tweets b; SELECT * FROM twtr.twtr_view c; SELECT * FROM twtr.tweets_by_minute d;
Go to CDW user interface, select Data Visualization and add a new Data VIZ.
In Data Visualization user interface, create a new connection. You must be logged in as admin to create a new connection.
Now that you have a connection to Hive virtual warehouse, let's create two datasets required to support the visuals.
Create first dataset:
Create second dataset:
It's now time to Import Visual Artifacts. Take a quick look at Importing a dashboard if you're doing it for the first time.
Choose dataviz-twitter-dashboard.json in the import dialog.
Once you get the following screen, click ACCEPT AND IMPORT.
Twitter Dashboard should be successfully imported at this point. To see it, go to VISUALS from the top menu and select Twitter Dashboard.
Congratulations on creating your real-time Twitter Dashboard using Cloudera Data Platform!!! To learn more about its implementation, please register here to watch the recording.