Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14816 | 09-01-2018 01:27 AM | |
1824 | 09-01-2018 01:18 AM | |
5413 | 08-20-2018 09:39 PM | |
922 | 07-20-2018 04:51 PM | |
2409 | 07-16-2018 09:41 PM |
09-30-2018
01:31 AM
You only need to use a Schema Registry if you plain on using Confluent's AvroConverter Note: NiFI can also be used to do CDC from MySQL https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html
... View more
09-30-2018
01:27 AM
On brokers termination, they remove themselves from Zookeeper
... View more
09-17-2018
07:48 PM
@ssarkar Is it not possible to use Ambari to install separate Zookeeper Host group, then configure a Kafka host group to use the secondary Zookeeper quorum?
... View more
09-04-2018
06:58 PM
@Manish
Tiwari, perhaps you can look at https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.1/content/data-lake/index.html Otherwise, you can search https://docs.hortonworks.com/ for the keywords you are looking for
... View more
09-01-2018
01:27 AM
Nagios / OpsView / Sensu are popular options I've seen StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server Prometheus is a popular option nowadays that would scrape metrics exposed by local service I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases. DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc. ... Realistically, you need some JMX + System monitoring tool, and a bunch exist
... View more
09-01-2018
01:18 AM
1 Kudo
A Data Lake is not tied to a platform or technology. Hadoop is not a requirement for a datalake either. IMO, a "data lake project" should not be a project description or the end goal; you can say you got your data from "source X", using "code Y", transformed and analyzed using "framework Z", but the combinations of tools out in the market that support such statements are so broad and vague that it really depends on what business use cases you are trying to solve. For example, S3 is replaceable with HDFS or GCS or Azure Storage. Redshift is replaceable with Postgres (and you really should use Athena anyway if the data you want to query is in S3, where Athena is replaceable by PrestoDB), and those can be compared to Google BigQuery. My suggestion would be not to tie yourself to a certain toolset, but if you are in AWS, their own documentation pages are very extensive. Since you are not asking about a Hortonworks specific question, I'm not sure what information you are looking for from this site.
... View more
08-21-2018
07:53 PM
@Shobhna Dhami After "available connectors" it does not list it, so you have not setup the classpath correctly, as I linked to. In Kafka 0.10, you need to run $ export CLASSPATH=/path/to/extracted-debezium-folder/*.jar # Replace with the real address
$ connect-distributed ... # Start Connect Server You can also perform a request to the /connector-plugins URL address before sending any configuration to verify the Debezium connector was correctly installed.
... View more
08-20-2018
09:39 PM
@Shobhna Dhami Somewhere under /usr/hdp/current/kafka there is a connect-distributed script. You run this and provide a connect-distributed.properties file. Assuming you are running a recent Kafka version (above 0.11.0), In the properties file, you would add a line that includes "plugin.path" that points to a directory containing the extracted package of the debezium connector. As mentioned in the Debezium documentation Simply download the connector’s plugin archive, extract the JARs into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs. Kafka Documentation - http://kafka.apache.org/documentation/#connect Confluent Documentation - https://docs.confluent.io/current/connect/index.html (note: Confluent is not a "custom version" of Kafka, they just provide a stronger ecosystem around it)
... View more
07-31-2018
10:26 PM
@Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests. Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.
... View more
07-31-2018
10:23 PM
Load balancers would help in the case where you want a more friendly name than some DNS records or the case where IP's are dynamic. Besides that, remembering one address is easier than a long list of 3-5 servers.
... View more