Member since
07-10-2017
112
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3862 | 01-14-2019 02:43 AM |
01-14-2019
02:43 AM
Hello, Structured Streaming is now supported in 6.1 https://blog.cloudera.com/blog/2018/12/cloudera-enterprise-6-1-0-is-now-available/ Could you retry testing it in this version? Should you face the same issue in that version let us know.
... View more
11-23-2018
07:47 AM
Hello, What version of Cloudera Kafka/CDH are you testing? Can you do the following: kafka-topics --zookeeper <zkhost>:2181 --list kafka-topics --zookeeper <zkhost>:2181 --describe --topic test Cheers! Manuel
... View more
11-23-2018
07:35 AM
Hello, Starting with version 2.0.0, KafkaStreams was included with CDK Powered By Apache Kafka, however, it is not supported and it is not tested in our software. Hence any deployments of Kafka streams applications are possible, but not guaranteed to work & hence the community is likely unable to provide recommendations on how to launch & scale these applications. Spark Streaming is the tried and tested technology for processing streams of data, and Cloudera has around hundred customers using Spark Streaming in production settings. In addition, with Spark Streaming, Cloudera's customers are able to benefit from the superior Spark ecosystem and libraries and machine learning support. Depending on your use case, there may be a way to scale Spark Streaming to fit your requirements. Cheers! Manuel
... View more
11-08-2018
01:18 AM
1 Kudo
Hello, You may install and test KSQL as it is open source. It is not tested & not supported by Cloudera. While functionally different, Spark Streaming may provide a solution for your use case. KSQL is very new and Spark Streaming is the tried and tested technology for processing streams of data, and Cloudera has around hundred customers using Spark Streaming in production settings. In addition, with Spark Streaming, Cloudera's customers are able to benefit from the superior Spark ecosystem and libraries and machine learning support. Hope this answers your question! Cheers, Manuel
... View more
11-08-2018
01:15 AM
1 Kudo
Hello, While these components are open source and you are free to test them and install them in Cloudera Kafka, they are not supported and have not been tested by Cloudera's dev team. 1 & 2) Starting with version 2.0.0, KafkaStreams was included with CDK Powered By Apache Kafka, however, it is not supported. Spark Streaming is the tried and tested technology for processing streams of data, and Cloudera has around hundred customers using Spark Streaming in production settings. In addition, with Spark Streaming, Cloudera's customers are able to benefit from the superior Spark ecosystem and libraries and machine learning support. 3) Schema Registry is currently not supported. As it is open source, users have the option to install separately in an unsupported manner or building your own lightweight alternative as per: http://blog.cloudera.com/blog/2018/07/robust-message-serialization-in-apache-kafka-using-apache-avro-part-1/ 4) You may install and test KSQL as it is open source. It is not tested & not supported by Cloudera. While functionally different, Spark Streaming may provide a solution for your use case. Hope these answer your questions! Cheers, Manuel
... View more
11-08-2018
01:01 AM
As of currently, Spark Structured Streaming is not supported: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_structured_streaming
... View more
11-04-2018
03:10 AM
Could you ellaborate on what and how you are trying to achieve? What kind of errors are you receiving? I'm assuming you are talking about setting up a Spark/Kafka integration using Python as the Spark language. Something as follows: http://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers Could you confirm or amend? Cloudera's distribution of Kafka officially supports Flume, Spark, and Java clients [1] as these have been tested by our development team. However, Spark Structured Streaming is currently untested and unsupported. [1]: http://www.cloudera.com/documentation/enterprise/latest/topics/kafka_end_to_end.html#kafka_end_to_end
... View more
09-06-2017
02:55 AM
Hello, I understand that you would like to interact with data on cluster within R. One idea is to use HttpFS with R curl package: https://cran.r-project.org/web/packages/curl/vignettes/intro.html Apache Hadoop HttpFS is a service that provides HTTP access to HDFS. HttpFS has a REST HTTP API supporting all HDFS filesystem operations (both read and write). Common HttpFS use cases are: Read and write data in HDFS using HTTP utilities (such as curl or wget) and HTTP libraries from languages other than Java. Transfer data between HDFS clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCp. Accessing WebHDFS using the Namenode WebUI port (default port 50070). Access to all data hosts in the cluster is required, because WebHDFS redirects clients to the datanode port (default 50075). If the cluster is behind a firewall, and you use WebHDFS to read and write data to HDFS, then Cloudera recommends you use the HttpFS server. The HttpFS server acts as a gateway. It is the only system that is allowed to send and receive data through the firewall. A more ad-hoc solution would be to use the Cloudera Data Science Workbench. Have you given it a try? www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html Cheers! Manuel
... View more