Member since 
    
	
		
		
		11-19-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                158
            
            
                Posts
            
        
                25
            
            
                Kudos Received
            
        
                21
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 16573 | 09-01-2018 01:27 AM | |
| 2478 | 09-01-2018 01:18 AM | |
| 6961 | 08-20-2018 09:39 PM | |
| 1284 | 07-20-2018 04:51 PM | |
| 3073 | 07-16-2018 09:41 PM | 
			
    
	
		
		
		11-30-2017
	
		
		09:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Michael Bronson Topics are never automatically deleted. The logs are retained for a configured number of bytes (log.retention.bytes) or period of time (log.retention.{hours, minutes, ms}), then the log segments are purged or compacted, which is another Kafka setting (log.cleanup.policy).   All the configurations that you seek are defined in the Kafka documentation, and you should really take these tunables into consideration when installing a production Kafka cluster.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-27-2017
	
		
		05:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Connect to which part of Hadoop? HDFS, Hive, HBase?  If HDFS, you can use WebHDFS from any programming language with an HTTP client, or you can include the hadoop-common library in your code via Maven, for example.   If HBase, there are Java clients you can find.   If Hive, you can use JDBC.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-21-2017
	
		
		09:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.   You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-15-2017
	
		
		07:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Confluent is the support company for Kafka. I personally would trust their code more than someone else's.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-14-2017
	
		
		07:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Swaapnika Guntaka You could use Spark Streaming in PySpark to consume a topic and write the data to HDFS.   You could also use HDF with NiFi and skip Python entirely.   Also, this is a Python client, by Confluent, not related to Kafka Connect. https://github.com/confluentinc/confluent-kafka-python 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-30-2017
	
		
		04:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Yes, MirrorMaker is not putting a limitation on remote vs local cluster. It is designed for remote clusters because there is almost no need to do it locally. If you are mirroring a topic locally, you must rename it, and if you are going to rename it, then you have consumers/producers using data in both topics?   You are replicating data within the same cluster for little gain while your consumers/producers can easily be configured to use  the correct topic(s).  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-27-2017
	
		
		08:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 At the basics, you would write a producer that consumes from one topic and produces to another.   MirrorMaker is what you are looking for.   https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_kafka-component-guide/content/ch_kafka_mirrormaker.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2017
	
		
		08:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @CaselChen Again, Spark connects directly to the HiveMetastore - using JDBC requires you to go through HiveServer2 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2017
	
		
		06:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Spark connects to the Hive metastore directly via a HiveContext. It does not (nor should, in my opinion) use JDBC.   First, you must compile Spark with Hive support, then you need to explicitly call enableHiveSupport() on the SparkSession bulider.   Additionally, Spark2 will need you to provide either   1. A hive-site.xml file in the classpath  2. Setting hive.metastore.uris . Refer: https://stackoverflow.com/questions/31980584/how-to-connect-to-a-hive-metastore-programmatically-in-sparksql  Additional resources  - https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables  - https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-hive-integration.html  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-22-2017
	
		
		06:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	You can get the JSON response.   
	https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/hosts.md  
 http://ambari-server:8080/clusters/:clusterName/hosts
  
	To extract the hostnames easier, you could try JSONPath 
 $.items[*].Hosts.host_name
  Or Python with Requests library  r = requests.get('...')
hosts = ','.join(x['Hosts']['host_name'] for x in r.json()['items']) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
- Next »
 
        













