About pminovic

pminovic · ‎09-13-2016

You are right, because Ambari supports only one ZK quorum, if you have 6 ZKs in your blueprint you will end up with a single 6-node ZK quorum, and you cannot change that using zkcli. Instead you can try: 1 cluster (a): Install the cluster from your blueprint without Kafka and with only 3 ZK nodes. When the cluster is up and running install the second, Kafka ZK quorum manually, you can find instructions here. Finally, add Kafka using Ambari and set its ZK quorum to the one you istalled manually. 1 cluster (b): Install the cluster from blueprint including Kafka and 3 ZK nodes. Then install another ZK manually using the above link, and change Kafka settings to use the new Zookeeper. I'd avoid this solution because the first Kafka will be "poluted" by Kafka ZK directories. 1 cluster (c), your solution with 6 ZK nodes: Remove 3 ZK nodes using Ambari and then install another ZK manually and change Kafka settings like in 1(b). 2 clusters: Install one cluster with all services but without Kafka and its Zookeeper, and one more Kafka-only cluster having only Kafka and its Zookeeper. This is the easiest solution because you can automate cluster deployment using Ambari, and you can monitor all your components using Ambari, but you will have 2 clusters.

pminovic · ‎09-13-2016

Hi @Wing Lo If facet pivot is exactly what you need how about accepting and upvoting Matt's answer? Thanks!

pminovic · ‎09-08-2016

The column list specification "INSERT INTO wkf107422_12_1_0 ( spersid )" is available starting with Hive-1.2 but you are most likely using an older version of Hive which doesn't support this feature (added by HIVE-9481). In old versions your SELECT statement has to provide all schema columns, in your case all 4. Regarding your "complementary information", it works but the resulting table wkf107422_12_1_1 contains only one column corresponding to spersid. To confirm try "describe wkf107422_12_1_1".

pminovic · ‎09-08-2016

The more the nodes in a ZK ensemble (quorum) the faster the reads but the slower the writes. That's because a read can be done from any node, but a write is not complete before all nodes are updated. On top of that, early versions of Kafka (0.8.2 and older) keep Kafka offsets on ZK. Therefore, as already suggested by @mqureshi, it's the best to start by creating a dedicated ZK for Kafka, I'd go for 3 nodes, and keep the 5-node ZK for everything else. Beefing up the number of ZK's to 7 or more is a resounding No. Regarding the installation and management of the new Kafka ZK, it's pretty straightforward to install it manually, just follow the steps in one of "Non-Ambari cluster installation guides" like this one. You can also try to create a cluster composed of only Kafka and ZK and manage it by its own Ambari instance.

pminovic · ‎09-07-2016

You need an additional, temporary table to read your input file, and then some date conversion: hive> create table tmp(a string, b string) row format delimited fields terminated by ','; hive> load data local inpath 'a.txt' overwrite into table tmp; hive> create table mytime(a string, b timestamp); hive> insert into table mytime select a, from_unixtime(unix_timestamp(b, 'dd-MM-yyyy HH:mm')) from tmp; hive> select * from mytime; a 2015-11-20 22:07:00 b 2015-08-17 09:45:00

pminovic · ‎09-07-2016

Interesting, so the JIRA removed the "empty regions are not merged away" clause. If so, I'd not enable normalization of pre-split tables.

pminovic · ‎09-06-2016

scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1)) scala> a.take(4) res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5)) Try the snippet above, just insert the path to your file on hdfs.

pminovic · ‎09-06-2016

In your example, 2 zero-size regions have been merged, while the logic page says: "empty" regions (less than 1MB, with the previous note) are not merged away. This is by design to prevent normalization from undoing the pre-splitting of a table. Can you kindly explain why.

pminovic · ‎09-04-2016

It doesn't work because, to quote the related wiki page: When using group by clause, the select statement can only include columns included in the group by clause, and aggregate functions on other columns. So, your query will work if you remove "group by" and "min".

pminovic · ‎09-02-2016

Hi @swathi thukkaraju, I see that you are using this solution in another question, so I guess it worked. If so, can you please accept & up-vote my answer to help us manage resolved questions. Thanks!

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: configuration of multiple zookeeper quorums on...

Re: Solr query question

Re: Unable to execute statement INSERT INTO SELECT

Re: ZK Best Practices

Re: hive date time problem

Re: HBase Region Normalizer

Re: Spark Scala - Remove rows that have columns wi...

Re: HBase Region Normalizer

Re: Hive Query not working

Re: I am getting this error to load the pig ouput ...