Member since
09-15-2018
61
Posts
6
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1834 | 04-17-2020 08:40 AM | |
9265 | 04-14-2020 04:45 AM | |
1208 | 04-14-2020 03:12 AM | |
957 | 10-17-2019 04:47 AM | |
1333 | 10-17-2019 04:33 AM |
11-10-2020
07:12 AM
1 Kudo
Yeah, you can use edge node however this is subject to matter of Cloudera Support terms between you and Cloudera.
... View more
07-31-2020
08:14 AM
I used to work at Cloudera/Hortonworks, and now I am a Hashmap Inc. consultant. This solution worked perfectly, thank you.
... View more
04-18-2020
07:43 AM
Thank you @TonyStank . This helps me.
... View more
04-18-2020
06:27 AM
Hey @kaf, Thanks for reaching out to the Cloudera community. You can use "tail" command and then pipeline it to Kafka console producer if you want to read the whole file and then continue tailing for subsequently appended lines. $ tail -f -n +1 <filename> | kafka-console-producer --broker-list <Broker_Host>:9092 --topic <topic_name> Let me know if this helps.
... View more
04-17-2020
07:06 AM
Hey @rishav1412, Thanks for reaching out to the Cloudera community. I don't think we have a single way/process/configuration in Kafka to stream data from all Social Media Platforms. Every social media platform have their own APIs/methods and policy on data streaming. If you want to stream data from Twitter you can use any of the mentioned ways/processes/services to send data from Twitter to Kafka topics. Twitter >> Kafka Connect(Kafka Connect Twitter) >> Kafka Topics Twitter >> Flume(org.apache.flume.source.twitter.TwitterSource) >> Kafka Topics Twitter >> NiFi(GetTwitter Processor) >> Kafka Topics Let me know if this helps.
... View more
04-17-2020
06:46 AM
Hey @Manoj690, Thanks for reaching out to the Cloudera community. You can execute a PUT request using the mentioned path "/connectors/<Connector_name>/config" to update the configuration for an existing connector. Also, pass a JSON object with the update parameter/s in the PUT request. Example request: PUT /connectors/<Connector_name>/config Accept: application/json { "flush.size": "100", "rotate.interval.ms": "1000" } Let me know if this helps.
... View more
04-15-2020
08:15 AM
Hey @saihadoop, Thanks for reaching out to the Cloudera community. After setting up the Cluster infra and installing CDH & CM, you can use Cloudera Manager API[1] to Backing Up and Restoring the Cloudera Manager Configuration of an existing Cluster to a New Cluster. [1]https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_intro_api.html#concept_dnn_cr5_mr Let me know if this helps. Cheers,
... View more
04-14-2020
08:44 AM
Hey @AndyTech, Thanks for reaching out to the Cloudera community. The commit-id mentioned here isn't related to any Kafka usage related terms such as 'commit offsets' or other terms. This commit id refers to the Kafka source from which it was built. It is not an error but just an info message. This doesn't impact Kafka client's functionality in any way. Let me know if this helps. Cheers,
... View more
04-14-2020
05:45 AM
Hey @AndyTech, Thanks for reaching out to the Cloudera community. This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes. $ pip install kafka-python
... View more
04-14-2020
05:31 AM
Hey @sharathkumar13, Thanks for reaching out to the Cloudera community. Can you clarify on this "Do we have options to do ?"? Are you looking to use Prometheus and Graffana to monitor Kafka Service?
... View more
04-14-2020
04:09 AM
@TonyStank appreciate your help. Stay safe.
... View more
04-14-2020
03:27 AM
Hey @ping_pong, Thanks for reaching out to the Cloudera community. Do you have TLS enabled for this CDH Cluster? What are the steps followed to add new Host to this CDH Cluster? > After installing all the required parcels/packages, have you started the Cloudera Manager Agent using the mentioned command? $ sudo service cloudera-scm-agent start
... View more
04-14-2020
03:08 AM
Hey, If you have an existing subscription for HDP products, try logging in using your existing HDP credentials. If not, try registering in the Cloudera portal. For learning and development purpose you can try using Hortonworks Sandbox.
... View more
11-07-2019
09:35 PM
Thanks all. I went to an other way downloaded on machine and extracted done the profile setting now it is work fine.
... View more
11-01-2019
06:31 AM
1 Kudo
Hey, CSD Version: 2.3 & higher, I think. Regards, Ankit.
... View more
10-18-2019
05:52 AM
Hey, Thank you for sharing the outcome and the steps. Much appreciated. Regards.
... View more
10-17-2019
05:47 AM
thanks, that put me in the right direction for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j setting PYTHONPATH fixed that issue export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2 export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH Now pyspark shows: version 2.3.0.cloudera3
... View more
10-17-2019
04:53 AM
Hey, Refer mentioned Cloudera Documentation[1] on "Configuring the Flume Properties File". [1]https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_flume_config.html Please let me know if this helps. Regards, Ankit.
... View more
10-17-2019
04:47 AM
Hey, Optimizing your Kafka Cluster depends upon your cluster usage & use-case. Based on your main concern like throughput or CPU utilization or Memory/Disk usage, you need to modify different parameters and some changes may have an impact on other aspects. For example, if acknowledgments is set to "all", all brokers that replicate the partitions need to acknowledge that the data was written prior to confirming the next message needs to be sent. This will ensure data consistency but increase CPU utilization and network latency. Refer Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines) article[1] written by Jay Kreps(Co-founder and CEO at Confluent). [1]https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Please let me know if this helps. Regards, Ankit.
... View more
09-12-2019
03:22 AM
I didnt use FQDN, instead i just added ip in /etc/hosts file. i used the same host ip in the kafka config
... View more
02-13-2019
10:07 PM
Hi Tony, Thanks for your reply, appericate all the hep provided by you and Gzigldrum Regards Wert
... View more
02-12-2019
06:56 AM
Hello thanks for your responce, Actually the issue is resolved and its JournalNodeEdits directory permission issue. So i have modified it and restarted Jornalnodes and started successfully. NN's are also came back and able to see all other dependent services. Thanks again 🙂
... View more
02-07-2019
07:48 PM
Hello, Kafka Connect is included with Cloudera Distribution of Apache Kafka 2.0.0 but is not supported at this time. Cloudera recommends using Flume and Sqoop as proven solutions for batch and real-time data loading that complement Kafka's message broker capability[2]. Kindly refer mentioned link[1] for more information. [1]https://www.cloudera.com/documentation/kafka/latest/topics/kafka_known_issues.html#xd_583c10bfdbd326ba-590cb1d1-149e9ca9886--6fcb__section_ens_4bf_55 [2]https://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/
... View more
02-05-2019
08:28 PM
Hello, 1. Writing Streaming Aggregation to File In order to use append mode with aggregations, you need to set an event time watermark (using "withWatermark"). Otherwise, Spark doesn't know when to output an aggregation result as "final". A watermark is a threshold to specify how long the system waits for late events. For example: df2 = df1.filter("code > 300").select("agent").withWatermark("timestamp", "2 minutes").groupBy("agent").count() 2. Reading from Kafka (Consumer) using Streaming You have to set SPARK_KAFKA_VERSION environment variable. When running jobs that require the new Kafka integration, set SPARK_KAFKA_VERSION=0.10 in the shell before launching spark-submit. # Set the environment variable for the duration of your shell session: export SPARK_KAFKA_VERSION=0.10 spark-submit arguments https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html
... View more
02-04-2019
02:22 AM
Hello, Monitoring consumer group lag using Cloudera Manager seems unlikely as I tried configuring a chart to display the consumer group lag but couldn't generate the desired results. However, on further research, I came around a few GitHub projects that provide additional monitoring functionality. One among them is "Kafka Manager(Yahoo, Apache 2.0 License)", I think with this tool you can monitor consumer group lag. Please refer mentioned link[1] for more information on Kafka Manager. [1]https://github.com/yahoo/kafka-manager
... View more
02-03-2019
05:09 AM
1 Kudo
Hello, Loading data directly to Kafka without any Service seems unlikely. However, you can use execute a simple kafka console producer to send all your data to the kafka service. But if your requirement is to save data to HDFS you need to include a few more services along with Kafka. For example, C rawlers >> kafka console producer (or) Spark Streaming >> Flume >> HDFS As your requirement is to store the data in HDFS and not stream the data. I suggest you execute a Spark job, it will store your data to HDFS. Refer mentioned commands to execute a spark job to move data to HDFS. Initiate a spark-shell Execute the mentioned command in the Spark shell in the same order. val moveFile = sc.textFile("file:///path/to/Sample.log") moveFile.saveAsTextFile("hdfs:///tmp/Sample.log")
... View more