Member since 
    
	
		
		
		11-19-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                158
            
            
                Posts
            
        
                25
            
            
                Kudos Received
            
        
                21
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 16570 | 09-01-2018 01:27 AM | |
| 2472 | 09-01-2018 01:18 AM | |
| 6959 | 08-20-2018 09:39 PM | |
| 1283 | 07-20-2018 04:51 PM | |
| 3072 | 07-16-2018 09:41 PM | 
			
    
	
		
		
		09-30-2018
	
		
		01:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You only need to use a Schema Registry if you plain on using Confluent's AvroConverter   Note: NiFI can also be used to do CDC from MySQL https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-30-2018
	
		
		01:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 On brokers termination, they remove themselves from Zookeeper 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-17-2018
	
		
		07:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@ssarkar Is it not possible to use Ambari to install separate Zookeeper Host group, then configure a Kafka host group to use the secondary Zookeeper quorum? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-04-2018
	
		
		06:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Manish
 Tiwari, perhaps you can look at https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.1/content/data-lake/index.html  Otherwise, you can search https://docs.hortonworks.com/ for the keywords you are looking for 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-01-2018
	
		
		01:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Nagios / OpsView / Sensu are popular options I've seen  StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server  Prometheus is a popular option nowadays that would scrape metrics exposed by local service  I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases.   DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc.   ... Realistically, you need some JMX + System monitoring tool, and a bunch exist 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-01-2018
	
		
		01:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 A Data Lake is not tied to a platform or technology. Hadoop is not a requirement for a datalake either.   IMO, a "data lake project" should not be a project description or the end goal; you can say you got your data from "source X", using "code Y", transformed and analyzed using "framework Z", but the combinations of tools out in the market that support such statements are so broad and vague that it really depends on what business use cases you are trying to solve.   For example, S3 is replaceable with HDFS or GCS or Azure Storage. Redshift is replaceable with Postgres (and you really should use Athena anyway if the data you want to query is in S3, where Athena is replaceable by PrestoDB), and those can be compared to Google BigQuery.   My suggestion would be not to tie yourself to a certain toolset, but if you are in AWS, their own documentation pages are very extensive. Since you are not asking about a Hortonworks specific question, I'm not sure what information you are looking for from this site.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2018
	
		
		07:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shobhna Dhami After "available connectors" it does not list it, so you have not setup the classpath correctly, as I linked to.   In Kafka 0.10, you need to run   $ export CLASSPATH=/path/to/extracted-debezium-folder/*.jar # Replace with the real address
$ connect-distributed ...  # Start Connect Server  You can also perform a request to the /connector-plugins URL address before sending any configuration to verify the Debezium connector was correctly installed.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		09:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shobhna Dhami   Somewhere under /usr/hdp/current/kafka there is a connect-distributed script.   You run this and provide a connect-distributed.properties file.   Assuming you are running a recent Kafka version (above 0.11.0), In the properties file, you would add a line that includes "plugin.path" that points to a directory containing the extracted package of the debezium connector.   As mentioned in the Debezium documentation  Simply download the connector’s plugin archive, extract the JARs into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs.  Kafka Documentation - http://kafka.apache.org/documentation/#connect  Confluent Documentation - https://docs.confluent.io/current/connect/index.html (note: Confluent is not a "custom version" of Kafka, they just provide a stronger ecosystem around it) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		10:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests.    Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		10:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Load balancers would help in the case where you want a more friendly name than some DNS records or the case where IP's are dynamic.   Besides that, remembering one address is easier than a long list of 3-5 servers.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













