About Jagatheeshr

Enis · ‎10-21-2015

There are two aspects to the question. First, is whether the replication can be controlled inside a region and have the data of a user to live only inside that region. This is possible in theory in a couple of different ways. If we can partition the users by region to different tables, and setup replication of all datacenters within the region, then we have achieved the boundary requirements. Some tables can be replicated to only datacenters within the region, while some other tables will be replicated cross-regions. HBase's replication model is pretty flexible in the sense that, we can do cyclic replication, etc (please read https://hbase.apache.org/book.html#_cluster_replication). If we cannot partition by table, we can still use the same table, but partition by column family (as noted above). Otherwise, we can still respect boundaries, using a recent feature called WALEntryFilter's. The basic idea would be to implement a custom WALEntryFilter which either (a) understands the data and selects which edits (mutations) to send to the receiving side (another geo-region) or (b) tag every edit with the intended regions it should hit and have the WALEntryFilter respect the tags from mutations. The second aspect is whether you can query the whole data set from any region. Of course, if you have some data not leaving its particular geo-region, you cannot have all the data aggregated in a single DC. So the only way to access the data in whole would be to dynamically send the query to all affected geo-regions and merge the results back.

schintalapani · ‎10-16-2015

Make sure you set the following config in kafkaspout's Spoutconfig spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); https://github.com/apache/storm/tree/master/external/storm-kafka Apart from that 1. Make sure you log.retention.hours is long enough to retain topic data 2. Check kafka topic offsets bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hostname:6667 --topic topic_name --time -1 the above command will give you latest offset into kafka topic and now you need to check if storm kafkaspout is catching up. 2.1 login into zookeeper shell 2.2 ls /zkroot/id (zkroot is the one configured in spoutconfig and id from spoutconfig) as well 2.3 get /zkroot/id/topic_name/part_0 will give you a json structure with key "offset" this will tell you how far you read into topic and also how far you are behind reading the latest data. If its too far apart and if log.retention.hours hit you kafkaspout might be requesting for older offset which might have been deleted.

Jagatheeshr · ‎10-17-2015

Thanks @schintalapani@hortonworks.com. So i see this as a change in the way kafka works from .7 to .8 .

schintalapani · ‎10-17-2015

kafka producer doesn't need to know about zookeeper cluster. It takes in broker-list as a config option which than used to send topic metadata requests to determine who is the leader of the topic partition

samkt99 · ‎06-20-2016

Using ambari for high availability setup for flume ,is there any complete step by step documentation installation instructions somewhere i can read . Please let me know the link . Thanks once again

emaxwell · ‎10-09-2015

The following reasons are cited by Siddarth Waggle in AMBARI-5707 (https://issues.apache.org/jira/browse/AMBARI-5707) as reasons for the proposal of Ambari Metrics: Problems with current system: Ganglia has limited capabilities for analyzing historic data, new plugins are not easy to write. Horizontal scale out for large clusters. No support for adhoc queries. Not easy to add metrics support for new services added to the stack. It is non trivial to hook up existing time series databases like OpenTSDB to store raw data forever. There was also a number of customers already using Nagios and/or Ganglia in their infrastructure and there were version incompatibilities between those installations and the versions shipped with HDP.

vvaks · ‎03-28-2016

It's all about maven dependencies. <dependencies> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.10.0</version> <scope>provided</scope> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.6</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-hdfs</artifactId> <version>0.10.0</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-kafka</artifactId> <version>0.10.0</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.10</artifactId> <version>0.8.2.2</version> <exclusions> <exclusion> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-hbase</artifactId> <version>0.10.0</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.1.1</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> </dependencies>

karthick_sounda · ‎11-22-2016

Thanks for reply @Josh Elser. Will create a separate post with all the information I got.

smdas · ‎02-23-2018

Thanks Brandon. I used the above approach to retrieve the history of alerts for a specific type. Adding a small trick for anyone using the above approach. The table "Alert_History" has the Alert Timestamp column in BigInt. For readability, use the PSQL Cast functionality. In my example, I am using the SQL to identify when Ambari triggers the DataNode WebUI alerts across all DataNodes: Select TO_CHAR(TO_TIMESTAMP(Alert_Timestamp / 1000), 'MM/DD/YYYY HH24:MI:SS') From Alert_History Where Alert_Label Like '%DataNode%UI%' And Alert_State ='CRITICAL' Order By 1 ASC;

dcave · ‎01-15-2016

Keep in mind also there is no upgrade path from Sqoop (HDP included) to Sqoop2

Online	Offline
Last Visited	‎12-12-2022 11:15 AM

Member Since	‎04-08-2019 05:52 AM
Last Visited	‎12-12-2022 11:15 AM
Posts	115
Kudos received	98

Cloudera Community

Re: How to secure sqoop data transfer channel

Re: How to remove Falcon service from Ambari?

Re: Is it possible to set 'Skip Group Modification...

Re: Can ranger work with AD without Kerberos?

Re: Address already in use when deploying a cluste...

Re: Geographically Distributed HBase

Re: How to handle kafka.common.OffsetOutOfRangeExc...

Re: Kafka Partitioning Class : Clarification

Re: Does Kafka Producer need to know Zookeper list...

Re: How to run Flume in HA ?

Re: What were the main factors to replace Nagios a...

Re: How to connect Storm to HBase ?

Re: Hbase Connectivity Fails

Re: How to view Alert History via Ambari ?

Re: Can NiFi be used to pipe the data from Oracle ...