About JordanMoore

JordanMoore · ‎09-30-2018

You only need to use a Schema Registry if you plain on using Confluent's AvroConverter Note: NiFI can also be used to do CDC from MySQL https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html

JordanMoore · ‎09-30-2018

On brokers termination, they remove themselves from Zookeeper

JordanMoore · ‎09-17-2018

@ssarkar Is it not possible to use Ambari to install separate Zookeeper Host group, then configure a Kafka host group to use the secondary Zookeeper quorum?

JordanMoore · ‎09-04-2018

@Manish Tiwari, perhaps you can look at https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.1/content/data-lake/index.html Otherwise, you can search https://docs.hortonworks.com/ for the keywords you are looking for

JordanMoore · ‎09-01-2018

Nagios / OpsView / Sensu are popular options I've seen StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server Prometheus is a popular option nowadays that would scrape metrics exposed by local service I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases. DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc. ... Realistically, you need some JMX + System monitoring tool, and a bunch exist

JordanMoore · ‎09-01-2018

A Data Lake is not tied to a platform or technology. Hadoop is not a requirement for a datalake either. IMO, a "data lake project" should not be a project description or the end goal; you can say you got your data from "source X", using "code Y", transformed and analyzed using "framework Z", but the combinations of tools out in the market that support such statements are so broad and vague that it really depends on what business use cases you are trying to solve. For example, S3 is replaceable with HDFS or GCS or Azure Storage. Redshift is replaceable with Postgres (and you really should use Athena anyway if the data you want to query is in S3, where Athena is replaceable by PrestoDB), and those can be compared to Google BigQuery. My suggestion would be not to tie yourself to a certain toolset, but if you are in AWS, their own documentation pages are very extensive. Since you are not asking about a Hortonworks specific question, I'm not sure what information you are looking for from this site.

JordanMoore · ‎08-21-2018

@Shobhna Dhami After "available connectors" it does not list it, so you have not setup the classpath correctly, as I linked to. In Kafka 0.10, you need to run $ export CLASSPATH=/path/to/extracted-debezium-folder/*.jar # Replace with the real address $ connect-distributed ... # Start Connect Server You can also perform a request to the /connector-plugins URL address before sending any configuration to verify the Debezium connector was correctly installed.

JordanMoore · ‎08-20-2018

@Shobhna Dhami Somewhere under /usr/hdp/current/kafka there is a connect-distributed script. You run this and provide a connect-distributed.properties file. Assuming you are running a recent Kafka version (above 0.11.0), In the properties file, you would add a line that includes "plugin.path" that points to a directory containing the extracted package of the debezium connector. As mentioned in the Debezium documentation Simply download the connector’s plugin archive, extract the JARs into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs. Kafka Documentation - http://kafka.apache.org/documentation/#connect Confluent Documentation - https://docs.confluent.io/current/connect/index.html (note: Confluent is not a "custom version" of Kafka, they just provide a stronger ecosystem around it)

JordanMoore · ‎07-31-2018

@Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests. Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.

JordanMoore · ‎07-31-2018

Load balancers would help in the case where you want a more friendly name than some DNS records or the case where IP's are dynamic. Besides that, remembering one address is easier than a long list of 3-5 servers.

Online	Offline
Last Visited	‎12-07-2015 12:15 PM

Member Since	‎11-19-2015 11:49 AM
Last Visited	‎12-07-2015 12:15 PM
Posts	158
Kudos received	25

Cloudera Community

Re: what is the most best monitoring tool for hado...

Re: What are the resources and technologies requir...

Re: How can I run kafka connect to import data fro...

Re: HDP Component working in deep

Re: I want to add an additional edge node to my ex...

Re: How Can I run debezium mysql connector using k...

Re: why we not get all kafka broker list from zook...

Re: Setting up separate Zookeeper Quorum for Kafka

Re: What are the resources and technologies requir...

Re: what is the most best monitoring tool for hado...

Re: What are the resources and technologies requir...

Re: How can I run kafka connect to import data fro...

Re: How can I run kafka connect to import data fro...

Re: why kafka should be un-even number

Re: Kafka behind an external Load Balancer