You can easily get started using SQL Stream Builder powered by Apache Flink with Apache Pulsar using the Cloudera Stream Processing Community Edition (CSP CE) and the Stream Native Pulsar SQL Connector. In this tutorial we will go over the steps needed to integrate these two together. Make sure you have a working CSP CE deployment by following the documentation. https://docs.cloudera.com/csp-ce/latest/index.html Build the Stream Native Pulsar Connector for Apache Flink https://github.com/streamnative/pulsar-flink/ Clone the repository to your machine and checkout the version for Flink 1.14 git clone https://github.com/streamnative/pulsar-flink.git
git checkout tags/release-1.14.3.4 Build the connector repo using maven mvn clean install -DskipTests Open SQL Stream Builder (SSB) and add the Connector Click Connectors on the Left Menu Click Register Connector in the top right corner Name the connector “pulsar” Select CSV, JSON, AVRO, and RAW Add the pulsar-flink-sql-connector_2.11-1.14.3.4.jar file Depending on when you build this the version and name could be slightly different This is found in the github cloned folder under /pulsar-flink/pulsar-flink-sql-connector/target You could also create this with properties that are required but in the case of this tutorial we will skip that part Restart the SSB Container to add the new jar to the class path docker restart $(docker ps -a --format '{{.Names}}' --filter "name=ssb-sse.(\d)") Create a Table that uses Pulsar To get started we will create just a very simple table that will be able to read RAW data as a single large string. Click Console on the right Add the DDL to the window You will need to update the topic, admin-url, and service-url In the case of the admin-url and service-url PULSAR_HOST should be the IP Address of the location of the Pulsar instance/container It's also very important to make sure that 'generic' = 'true' is part of the properties to ensure that SSB provides the correct table/topic names when executing a query. For more information see this ticket - https://github.com/streamnative/pulsar-flink/issues/528 Click Execute CREATE TABLE pulsar_test (
`city` STRING
) WITH (
'connector' = 'pulsar',
'topic' = 'topic82547611',
'value.format' = 'raw',
'service-url' = 'pulsar://PULSAR_HOST:6650',
'admin-url' = 'http://PULSAR_HOST:8080',
'scan.startup.mode' = 'earliest',
'generic' = 'true'
); Start a Pulsar Container Check out the latest Pulsar Documentation for more details on running a standalone pulsar container https://pulsar.apache.org/docs/getting-started-docker/ Run the docker command to get a standalone Pulsar deployment docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.10.1 bin/pulsar standalone Push messages to Pulsar and start a query in SSB It’s possible to use SSB to write data to Pulsar with an INSERT SQL statement by running the below command in the SSB Console and clicking Execute. Tip - Parts of console code can be executed by just highlighting the code you wish to execute. INSERT INTO pulsar_test
VALUES ('data 1'),('data 2'),('data 3') Flink and SSB come with the ability to generate data with Faker. This table will randomly generate messages that we can use to insert into our pulsar table. CREATE TABLE fake_data (
city STRING )
WITH (
'connector' = 'faker',
'rows-per-second' = '1',
'fields.city.expression' = '#{Address.city}'
); The data that is generated can be inserted into our pulsar table just like a normal SQL query insert into pulsar_test
select * from fake_data; Query Data from Pulsar It can also be queried back out the same way and a sample of results displayed in the UI. Tip - When you run a query, only a small sample of the data is shown in the SSB UI by default. To see all records the configuration of the sampler needs to be changed. To have all data returned in the SSB UI console click Job Settings -> Sample Behavior -> Sample All Messages Select the data from the test table select * from pulsar_test PS - It's also possible to publish messages with the Pulsar client and read them back out. To look at how to publish with the Pulsar client see here: https://pulsar.apache.org/docs/getting-started-docker/#produce-a-message
... View more
The Community Edition of Cloudera Stream Processing (CSP) makes developing stream processors easy as it can be done right from your desktop or any other development node. Analysts, data scientists, and developers can now evaluate new features, develop SQL-based stream processors locally using SQL Stream Builder powered by Flink, and develop Kafka Consumers/Producers and Kafka Connect Connectors, all locally before moving to production. Download CSP CE https://www.cloudera.com/downloads/cdf/csp-community-edition.html CSP CE Documentation https://docs.cloudera.com/csp-ce/latest/index.html Tutorials Streaming Analytics Documentation Tutorial Streams Messaging Documentation Tutorial Using SQL Stream Builder with Apache Pulsar
... View more
Welcome to the Stream Processing Group!
It's our hope that here you will be able to learn more about Stream Processing by asking questions and reading other materials that similar minded folks have shared. By using the Stream Processing Community Edition you will be able to easily start to play with the technology on your local machine, allowing you to understand the value of it for solving your use cases. If you're already an existing Stream Processing user the Community Edition is your gateway into trying new features to better understand the value of upgrading your existing deployments to the latest versions.
Cloudera Stream Processing is powered by Apache Flink and Kafka and provides a complete, enterprise-grade stream management and stateful processing solution. The combination of Kafka as the storage streaming substrate, Flink as the core in-stream processing engine, and first-class support for industry standard interfaces like SQL and REST allows developers, data analysts, and data scientist to easily build hybrid streaming data pipelines that power real-time data products, dashboards, business intelligence apps, microservices, and data science notebooks.
Check out our getting started post linked below to download the community edition and get started with some simple tutorials, find other connectors, or read some blogs to learn more.
Click Here To Get Started
... View more