Member since
04-08-2019
115
Posts
97
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5083 | 04-16-2016 03:39 AM | |
2953 | 04-14-2016 11:13 AM | |
5341 | 04-13-2016 12:31 PM | |
6417 | 04-08-2016 03:47 AM | |
5153 | 04-07-2016 05:05 PM |
10-21-2015
08:43 PM
There are two aspects to the question. First, is whether the replication can be controlled inside a region and have the data of a user to live only inside that region. This is possible in theory in a couple of different ways. If we can partition the users by region to different tables, and setup replication of all datacenters within the region, then we have achieved the boundary requirements. Some tables can be replicated to only datacenters within the region, while some other tables will be replicated cross-regions. HBase's replication model is pretty flexible in the sense that, we can do cyclic replication, etc (please read https://hbase.apache.org/book.html#_cluster_replication). If we cannot partition by table, we can still use the same table, but partition by column family (as noted above). Otherwise, we can still respect boundaries, using a recent feature called WALEntryFilter's. The basic idea would be to implement a custom WALEntryFilter which either (a) understands the data and selects which edits (mutations) to send to the receiving side (another geo-region) or (b) tag every edit with the intended regions it should hit and have the WALEntryFilter respect the tags from mutations. The second aspect is whether you can query the whole data set from any region. Of course, if you have some data not leaving its particular geo-region, you cannot have all the data aggregated in a single DC. So the only way to access the data in whole would be to dynamically send the query to all affected geo-regions and merge the results back.
... View more
10-16-2015
11:27 PM
1 Kudo
Make sure you set the following config in kafkaspout's Spoutconfig spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); https://github.com/apache/storm/tree/master/external/storm-kafka Apart from that 1. Make sure you log.retention.hours is long enough to retain topic data 2. Check kafka topic offsets bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hostname:6667 --topic topic_name --time -1 the above command will give you latest offset into kafka topic and now you need to check if storm kafkaspout is catching up. 2.1 login into zookeeper shell 2.2 ls /zkroot/id (zkroot is the one configured in spoutconfig and id from spoutconfig) as well 2.3 get /zkroot/id/topic_name/part_0 will give you a json structure with key "offset" this will tell you how far you read into topic and also how far you are behind reading the latest data. If its too far apart and if log.retention.hours hit you kafkaspout might be requesting for older offset which might have been deleted.
... View more
10-17-2015
09:51 PM
Thanks @schintalapani@hortonworks.com. So i see this as a change in the way kafka works from .7 to .8 .
... View more
10-17-2015
05:12 PM
2 Kudos
kafka producer doesn't need to know about zookeeper cluster. It takes in broker-list as a config option which than used to send topic metadata requests to determine who is the leader of the topic partition
... View more
06-20-2016
04:42 AM
Using ambari for high availability setup for flume ,is there any complete step by step documentation installation instructions somewhere i can read . Please let me know the link . Thanks once again
... View more
10-09-2015
02:10 PM
2 Kudos
The following reasons are cited by Siddarth Waggle in AMBARI-5707 (https://issues.apache.org/jira/browse/AMBARI-5707) as reasons for the proposal of Ambari Metrics: Problems with current system:
Ganglia has limited capabilities for analyzing historic data, new plugins are not easy to write. Horizontal scale out for large clusters. No support for adhoc queries. Not easy to add metrics support for new services added to the stack. It is non trivial to hook up existing time series databases like OpenTSDB to store raw data forever. There was also a number of customers already using Nagios and/or Ganglia in their infrastructure and there were version incompatibilities between those installations and the versions shipped with HDP.
... View more
03-28-2016
06:06 AM
It's all about maven dependencies. <dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.10.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.6</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hdfs</artifactId>
<version>0.10.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.10.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.2</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>0.10.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.1</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
... View more
11-22-2016
11:46 PM
Thanks for reply @Josh Elser. Will create a separate post with all the information I got.
... View more
02-23-2018
07:59 AM
Thanks Brandon. I used the above approach to retrieve the history of alerts for a specific type. Adding a small trick for anyone using the above approach. The table "Alert_History" has the Alert Timestamp column in BigInt. For readability, use the PSQL Cast functionality. In my example, I am using the SQL to identify when Ambari triggers the DataNode WebUI alerts across all DataNodes: Select TO_CHAR(TO_TIMESTAMP(Alert_Timestamp / 1000), 'MM/DD/YYYY HH24:MI:SS') From Alert_History Where Alert_Label Like '%DataNode%UI%' And Alert_State ='CRITICAL' Order By 1 ASC;
... View more
01-15-2016
01:57 PM
Keep in mind also there is no upgrade path from Sqoop (HDP included) to Sqoop2
... View more
- « Previous
- Next »