About JordanMoore

JordanMoore · ‎11-30-2017

@Michael Bronson Topics are never automatically deleted. The logs are retained for a configured number of bytes (log.retention.bytes) or period of time (log.retention.{hours, minutes, ms}), then the log segments are purged or compacted, which is another Kafka setting (log.cleanup.policy). All the configurations that you seek are defined in the Kafka documentation, and you should really take these tunables into consideration when installing a production Kafka cluster.

JordanMoore · ‎11-27-2017

Connect to which part of Hadoop? HDFS, Hive, HBase? If HDFS, you can use WebHDFS from any programming language with an HTTP client, or you can include the hadoop-common library in your code via Maven, for example. If HBase, there are Java clients you can find. If Hive, you can use JDBC.

JordanMoore · ‎11-21-2017

You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster. You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.

JordanMoore · ‎11-15-2017

Confluent is the support company for Kafka. I personally would trust their code more than someone else's.

JordanMoore · ‎11-14-2017

@Swaapnika Guntaka You could use Spark Streaming in PySpark to consume a topic and write the data to HDFS. You could also use HDF with NiFi and skip Python entirely. Also, this is a Python client, by Confluent, not related to Kafka Connect. https://github.com/confluentinc/confluent-kafka-python

JordanMoore · ‎10-30-2017

Yes, MirrorMaker is not putting a limitation on remote vs local cluster. It is designed for remote clusters because there is almost no need to do it locally. If you are mirroring a topic locally, you must rename it, and if you are going to rename it, then you have consumers/producers using data in both topics? You are replicating data within the same cluster for little gain while your consumers/producers can easily be configured to use the correct topic(s).

JordanMoore · ‎10-27-2017

At the basics, you would write a producer that consumes from one topic and produces to another. MirrorMaker is what you are looking for. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_kafka-component-guide/content/ch_kafka_mirrormaker.html

JordanMoore · ‎10-10-2017

@CaselChen Again, Spark connects directly to the HiveMetastore - using JDBC requires you to go through HiveServer2

JordanMoore · ‎08-24-2017

Spark connects to the Hive metastore directly via a HiveContext. It does not (nor should, in my opinion) use JDBC. First, you must compile Spark with Hive support, then you need to explicitly call enableHiveSupport() on the SparkSession bulider. Additionally, Spark2 will need you to provide either 1. A hive-site.xml file in the classpath 2. Setting hive.metastore.uris . Refer: https://stackoverflow.com/questions/31980584/how-to-connect-to-a-hive-metastore-programmatically-in-sparksql Additional resources - https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables - https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-hive-integration.html

JordanMoore · ‎08-22-2017

You can get the JSON response. https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/hosts.md http://ambari-server:8080/clusters/:clusterName/hosts To extract the hostnames easier, you could try JSONPath $.items[*].Hosts.host_name Or Python with Requests library r = requests.get('...') hosts = ','.join(x['Hosts']['host_name'] for x in r.json()['items'])

Online	Offline
Last Visited	‎12-07-2015 12:15 PM

Member Since	‎11-19-2015 11:49 AM
Last Visited	‎12-07-2015 12:15 PM
Posts	158
Kudos received	25

Cloudera Community

Re: what is the most best monitoring tool for hado...

Re: What are the resources and technologies requir...

Re: How can I run kafka connect to import data fro...

Re: HDP Component working in deep

Re: I want to add an additional edge node to my ex...

Re: what is the safe and best way to delete the ka...

Re: To connect to Hadoop using Java

Re: how to clear HDFS directories on specific host

Re: Ways to get data from Kafka to HDFS

Re: Ways to get data from Kafka to HDFS

Re: How to Kafka topic clone.

Re: How to Kafka topic clone.

Re: Spark with HIVE JDBC connection

Re: Spark with HIVE JDBC connection

Re: API + how to know by API command all machines ...