Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14780 | 09-01-2018 01:27 AM | |
1809 | 09-01-2018 01:18 AM | |
5391 | 08-20-2018 09:39 PM | |
913 | 07-20-2018 04:51 PM | |
2402 | 07-16-2018 09:41 PM |
11-30-2017
09:06 PM
@Michael Bronson Topics are never automatically deleted. The logs are retained for a configured number of bytes (log.retention.bytes) or period of time (log.retention.{hours, minutes, ms}), then the log segments are purged or compacted, which is another Kafka setting (log.cleanup.policy). All the configurations that you seek are defined in the Kafka documentation, and you should really take these tunables into consideration when installing a production Kafka cluster.
... View more
11-27-2017
05:28 PM
1 Kudo
Connect to which part of Hadoop? HDFS, Hive, HBase? If HDFS, you can use WebHDFS from any programming language with an HTTP client, or you can include the hadoop-common library in your code via Maven, for example. If HBase, there are Java clients you can find. If Hive, you can use JDBC.
... View more
11-21-2017
09:48 PM
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster. You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
... View more
11-15-2017
07:29 PM
Confluent is the support company for Kafka. I personally would trust their code more than someone else's.
... View more
11-14-2017
07:31 PM
1 Kudo
@Swaapnika Guntaka You could use Spark Streaming in PySpark to consume a topic and write the data to HDFS. You could also use HDF with NiFi and skip Python entirely. Also, this is a Python client, by Confluent, not related to Kafka Connect. https://github.com/confluentinc/confluent-kafka-python
... View more
10-30-2017
04:18 AM
1 Kudo
Yes, MirrorMaker is not putting a limitation on remote vs local cluster. It is designed for remote clusters because there is almost no need to do it locally. If you are mirroring a topic locally, you must rename it, and if you are going to rename it, then you have consumers/producers using data in both topics? You are replicating data within the same cluster for little gain while your consumers/producers can easily be configured to use the correct topic(s).
... View more
10-27-2017
08:57 PM
At the basics, you would write a producer that consumes from one topic and produces to another. MirrorMaker is what you are looking for. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_kafka-component-guide/content/ch_kafka_mirrormaker.html
... View more
10-10-2017
08:52 PM
@CaselChen Again, Spark connects directly to the HiveMetastore - using JDBC requires you to go through HiveServer2
... View more
08-24-2017
06:29 PM
Spark connects to the Hive metastore directly via a HiveContext. It does not (nor should, in my opinion) use JDBC. First, you must compile Spark with Hive support, then you need to explicitly call enableHiveSupport() on the SparkSession bulider. Additionally, Spark2 will need you to provide either 1. A hive-site.xml file in the classpath 2. Setting hive.metastore.uris . Refer: https://stackoverflow.com/questions/31980584/how-to-connect-to-a-hive-metastore-programmatically-in-sparksql Additional resources - https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables - https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-hive-integration.html
... View more
08-22-2017
06:53 PM
You can get the JSON response.
https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/hosts.md
http://ambari-server:8080/clusters/:clusterName/hosts
To extract the hostnames easier, you could try JSONPath
$.items[*].Hosts.host_name
Or Python with Requests library r = requests.get('...')
hosts = ','.join(x['Hosts']['host_name'] for x in r.json()['items'])
... View more
- « Previous
- Next »