About cstanca

rsanthosh_devop · ‎02-13-2019

I installed geomesa 1.3.5 in 10 node cluster. We are using kerberos to secure the cluster. Does geomesa works with kerberos?

cstanca · ‎07-19-2017

Overview The following versions of Apache Kafka have been incorporated in HDP 2.2.0 to 2.6.1: 0.8.1, 0.8.2, 0.9.0, 0.10.0, 0.10.1. Apache Kafka is now at 0.11. Hortonworks is working to make Kafka easier for enterprises to use. New focus areas include creation of a Kafka Admin Panel to create/delete topics and manage user permissions, easier and safer distribution of security tokens and support for multiple ways of publishing/consuming data via a Kafka REST server/API. Here are a few areas of strong contribution: Operations: Rack awareness for Increased resilience and availability such that replicas are isolated so they are guaranteed to span multiple racks or availability zones. Automated replica leader election for automated, even distribution of leaders in a cluster capability by detecting uneven distribution with some brokers serving more data compared to others and makes adjustments. Message Timestamps so every message in Kafka now has a timestamp field that indicates the time at which the message was produced. SASL improvements including external authentication servers and support of multiple types of SASL authentication on one server Ambari Views for visualization of Kafka operational metrics Security: Kafka security encompasses multiple needs – the need to encrypt the data flowing through Kafka and preventing rogue agents from publishing data to Kafka, as well as the ability to manage access to specific topics on an individual or group level. As a result, latest updates in Kafka support wire encryption via SSL, Kerberos based authentication and granular authorization options via Apache Ranger or other pluggable authorization system. This article lists below new features beyond Hortonworks contribution. At the high level, the following have been added by the overall community. Kafka Streams API Kafka Connect API New unified Consumer API Transport encryption using TLS/SSL Kerberos/SASL Authentication support Access Control Lists Timestamps on messages Reduced client dependence on zookeeper (offsets stored in Kafka topic) Client interceptors New Features Since HDP 2.2 Here is the list of NEW FEATURES as they have been included in the release notes. Kafka 0.8.1: https://archive.apache.org/dist/kafka/0.8.1/RELEASE_NOTES.html [KAFKA-330] - Add delete topic support [KAFKA-554] - Move all per-topic configuration into ZK and add to the CreateTopicCommand [KAFKA-615] - Avoid fsync on log segment roll [KAFKA-657] - Add an API to commit offsets [KAFKA-925] - Add optional partition key override in producer [KAFKA-1092] - Add server config parameter to separate bind address and ZK hostname [KAFKA-1117] - tool for checking the consistency among replicas Kafka 0.8.2: https://archive.apache.org/dist/kafka/0.8.2.0/RELEASE_NOTES.html [KAFKA-1000] - Inbuilt consumer offset management feature for kakfa [KAFKA-1227] - Code dump of new producer [KAFKA-1384] - Log Broker state [KAFKA-1443] - Add delete topic to topic commands and update DeleteTopicCommand [KAFKA-1512] - Limit the maximum number of connections per ip address [KAFKA-1597] - New metrics: ResponseQueueSize and BeingSentResponses [KAFKA-1784] - Implement a ConsumerOffsetClient library Kafka 0.9.0: https://archive.apache.org/dist/kafka/0.9.0.0/RELEASE_NOTES.html [KAFKA-1499] - Broker-side compression configuration [KAFKA-1785] - Consumer offset checker should show the offset manager and offsets partition [KAFKA-2120] - Add a request timeout to NetworkClient [KAFKA-2187] - Introduce merge-kafka-pr.py script Kafka 0.10.0: https://archive.apache.org/dist/kafka/0.10.0.0/RELEASE_NOTES.html [KAFKA-2832] - support exclude.internal.topics in new consumer [KAFKA-3046] - add ByteBuffer Serializer&Deserializer [KAFKA-3490] - Multiple version support for ducktape performance tests Kafka 0.10.0.1: https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html [KAFKA-3538] - Abstract the creation/retrieval of Producer for stream sinks for unit testing Kafka 0.10.1: https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html [KAFKA-1464] - Add a throttling option to the Kafka replication tool [KAFKA-3176] - Allow console consumer to consume from particular partitions when new consumer is used. [KAFKA-3492] - support quota based on authenticated user name [KAFKA-3776] - Unify store and downstream caching in streams [KAFKA-3858] - Add functions to print stream topologies [KAFKA-3909] - Queryable state for Kafka Streams [KAFKA-4015] - Change cleanup.policy config to accept a list of valid policies [KAFKA-4093] - Cluster id Final Notes Apache Kafka shines in use cases like: replacement for a more traditional message broker user activity tracking pipeline as a set of real-time publish-subscribe feeds (the original use case) operational monitoring data log aggregation stream processing event sourcing commit log Apache Kafka continues to be a dynamic and extremely popular project with more and more adoption.

cstanca · ‎07-26-2017

Thanks. Setting doAs=true should do the trick.

cstanca · ‎06-07-2017

@Matt Burgess Thank you so much.

Ravi_G · ‎04-18-2017

@Constantin Stanca Namenode HA is NOT mandatory for standalone views

Anilbagga08 · ‎04-05-2017

executed kdestroy and generated a new ticket and krb5cache” worked

cstanca · ‎04-03-2017

@Rohan Pednekar This is true also for any scan that requires evaluation before retrieving anything. I am not sure why this would be an HCC article. This is merely one paragraph of what could have been a well-written article about tips and tricks when dealing with HBase. I recommend looking at some of the featured articles in HCC and write that quality. This section you published could be very useful in a larger article. Thanks for your efforts.

bhopp · ‎04-20-2017

It includes Spark 1.6 and 2.1. Hbase 1.1.2 https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna-ssl.com/wp-content/uploads/2017/03/Asparagus-2.6.png

cstanca · ‎03-20-2017

@P D That is the usual QA step. Pick and choose from here: https://github.com/aengusrooneyhortonworks/HadoopBenchmarks If you use HDFS, Hive, HBase, choose those applicable. At the minimum you could hive test-bench and teragen/terasort, and maybe one for HBase. You could do those, but it may take time. You could just login to Hive and run some queries. Then log to HBase and perform usual commands using hbase-shell and you could also run SQL via Phoenix. This is a smoke test suite that you could build for the upgrades. You may have to include tests for all the tools in the ecosystem. There will be Storm topologies that you have to handle. There will be Spark jobs that you have to test etc. A test plan of each tool is a good thing.