Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11897 | 09-01-2018 01:27 AM | |
1143 | 09-01-2018 01:18 AM | |
3798 | 08-20-2018 09:39 PM | |
509 | 07-20-2018 04:51 PM | |
1521 | 07-16-2018 09:41 PM |
08-21-2018
08:07 PM
This is a very broad topic, and might make sense to use a vendor supported tool like EMR or Qubole. Cloudbreak or Hortonworks itself doesn't offer very well-defined backup tools. For example, Hadoop DistCP and mysqldump/pgdump, Hive/HBase Export only get you so far.
... View more
08-21-2018
08:05 PM
Hive Streaming tables need to be ORC, right? Do the Avro records automatically get converted?
... View more
08-21-2018
07:53 PM
@Shobhna Dhami After "available connectors" it does not list it, so you have not setup the classpath correctly, as I linked to. In Kafka 0.10, you need to run $ export CLASSPATH=/path/to/extracted-debezium-folder/*.jar # Replace with the real address
$ connect-distributed ... # Start Connect Server You can also perform a request to the /connector-plugins URL address before sending any configuration to verify the Debezium connector was correctly installed.
... View more
08-21-2018
07:46 PM
@Vamshi
Reddy
Yes, "Confluent" is not some custom version of Apache Kafka In fact, this process is very repeatable for all other Kafka Connect plugins. Download the code Build it against the Kafka version you run Move the package to the Connect server Extract the JAR files onto the Connect server CLASSPATH Run/Restart Connect
... View more
08-20-2018
11:26 PM
From a non-Hadoop machine, install Java+Maven+Git
git clone https://github.com/confluentinc/kafka-connect-hdfs
cd kafka-connect-hdfs
git fetch --all --tags --prune
git checkout tags/v4.1.2 # This is a Confluent Release number, which corresponds to a Kafka release number
mvn clean install -DskipTests
This should generate some files under the target folder in that directory.
So, using the 4.1.2 example, I would
ZIP up the "target/kafka-connect-hdfs-4.1.2-package/share/java/" folder that was built, then copy this file and extract it into all HDP servers that I want to run Kafka Connect on. For example, /opt/kafka-connect-hdfs/share/java
From there, you would find your "connect-distributed.properties" file and add a line for
plugin.path=/opt/kafka-connect-hdfs/share/java
Now, run something like this (I don't know the full location of the property files)
connect-distributed /usr/hdp/current/kafka/.../connect-distributed.properties
Once that starts, you can attempt to hit http://connect-server:8083/connector-plugins , and you should see an item for "io.confluent.connect.hdfs.HdfsSinkConnector"
If successful, continue to read the HDFS Connector documentation, then POST the JSON configuration body to the Connect Server endpoint. (or use Landoop's Connect UI tool)
... View more
08-20-2018
09:39 PM
@Shobhna Dhami Somewhere under /usr/hdp/current/kafka there is a connect-distributed script. You run this and provide a connect-distributed.properties file. Assuming you are running a recent Kafka version (above 0.11.0), In the properties file, you would add a line that includes "plugin.path" that points to a directory containing the extracted package of the debezium connector. As mentioned in the Debezium documentation Simply download the connector’s plugin archive, extract the JARs into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs. Kafka Documentation - http://kafka.apache.org/documentation/#connect Confluent Documentation - https://docs.confluent.io/current/connect/index.html (note: Confluent is not a "custom version" of Kafka, they just provide a stronger ecosystem around it)
... View more
08-19-2018
05:37 AM
If the end goal is to transfer Kafka to Postgres, you have access to NiFi. Otherwise, exposing the internal Postgres server that is used for Hive, Ambari, Oozie, and other services is probably not a good idea. It would be recommended to run a standalone Postgres server to minimize the blast-radius of failure and maintain service uptime.
... View more
07-31-2018
10:26 PM
@Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests. Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.
... View more
07-31-2018
10:23 PM
Load balancers would help in the case where you want a more friendly name than some DNS records or the case where IP's are dynamic. Besides that, remembering one address is easier than a long list of 3-5 servers.
... View more
07-30-2018
06:53 PM
@Michael Bronson - The terms "master/worker" don't really mean anything in Kafka terms. 17 Kafka brokers seems like a lot (we have about that many brokers in AWS handling about 2million messages per day), but yes, a minimum of 5 ZKs is encouraged to account for maintenance and hardware failure, as mentioned.
... View more