About pminovic

pminovic · ‎04-29-2016

HI @simran kaur, to answer your questions There is no limit to "size", or capacity of the DN. It's only bound by the number of hard disk slots and capacity of your individual disks. If you have 12 slots and 6T per disk, then it's 72T per node. Datanode is a process managing HDFS files on a machine. You use only 1 DN on the same machine. You specify your DN directories, typically mounting points of your disks in dfs.datanode.data.dir. That's all, HDFS will take care of organizing data there. You configure block size as the dfs.blocksize property in HDFS. The default is 134217728 or 128M. The default of 128M is considered an optimal size for general-purpose clusters. If you keep many large files it can be increased, for example to 256M. And finally your DN capacity of only 991M indicates that something is wrong or you are running a Sandbox on a machine with small capacity. My capacity on my Sandbox is 45G.

pminovic · ‎04-28-2016

Hi @vijaya inturi you can add multiple HS2 servers, and access them using Zookeeper discovery. On each call ZK will connect you to a random, alive HS2 instance. In a Kerberos environment the user who runs beeline needs a valid ticket, and has to insert the hive principal in the connect string. You can find beeline connection string samples here.

pminovic · ‎04-28-2016

IIRC what's required is to have a correct requested output in a right place in HDFS. So you don't need to save it. However, you will save a lot of time if you save your Hive code in a script so that you can easily re-run it if needed. Good luck!

pminovic · ‎04-27-2016

For manual upgrade, with disclaimer provided by Emil, you can follow these steps Download the .repo file of HDP-2.4 for your OS (if you have no Internet access you need to setup a local repo). Links can be found here. Stop your Kafka broker Backup config files in /etc/kafka/conf Run "yum upgrade kafka" (or zypper if you are on Suse) Follow the steps here to configure and start Kafka. You can reuse your old configs, but will need some new ones like "listeners". Be sure to set log.dirs and zookeeper.connect to the values used before the upgrade Start kafka, either from the command-line (link above), it might also work from Ambari. If you face issues I've listed 4 known issues here: https://community.hortonworks.com/content/kbentry/29224/troubleshooting-kafka-upgrade.html

pminovic · ‎04-26-2016

Not sure, but I guess restarting ambari-agent's did it.

pminovic · ‎04-26-2016

From my experience, in case of Spark, "Running containers" denotes the requested number of executors plus one (for the driver), while "Allocated Memory MB" denotes the allocated memory required to satisfy the request in a multiple of "minimum-allocation-mb". My example: minimum-allocation-mb=4096 num-executors: 100 executor-memory: 7G spark.driver.memory=7G ---- Display ---- Running containers: 101 Allocated Mem MB=827392 (= 202 * 4096) 2*minimum-allocation-mb used to accommodate 7G plus the overhead, which in the latest versions of Spark is max(384, executor/driver-memory*0.1), in my case 700M.

pminovic · ‎04-26-2016

Since you plan dedicated Kafka nodes in your "cluster for everything" then Kafka performance will be the same in comparison to a stand-alone Kafka cluster. However, it's good to have a dedicated Zookeeper quorum for Kafka, and in the first option Ambari currently doesn't support 2 ZK quorums per cluster, so you will need to install your ZK for Kafka manually. That's not so complicated, but if you go for a stand-alone Kafka solution, you can use Ambari to install and manage your ZK. So, my recommendation is to go for a stand-alone Kafka cluster.

pminovic · ‎04-26-2016

Your BIND dn is empty. I did Ambari sync with Free IPA a few months ago. I created a system account for binding to LDAP using ldapmodify as explained here, and used that for my BIND dn. Also check other properties set during "ambari-server setup-ldap" and make sure they are in sync with the ones set by IPA. You can use "ipa user-find" to inspect the structure of your users. To change some properties in Ambari you can re-run setup-ldap or set properties directly in /etc/ambari-server/conf/ambari.properties and restart ambari-server.

pminovic · ‎04-25-2016

Hi @omkar pathallapalli, a single Sqoop command can import only a single table from a given DB server. So, to import multiple tables from multiple servers you need a command, for example a Bash script like this for tbl in $(cat $4); do sqoop import –-connect "jdbc:sqlserver://${1}:3464;databaseName=${2}" --username ${3} -P --table ${tbl} --target-dir sqimport done And call it once per DB server providing each DB server FQDN, database name, user-name and a file listing the tables you want to import (one table per line). The script will prompt you for the password. You can modify target-dir and/or add more Sqoop properties including the number of mappers used for import, by default 4.

pminovic · ‎04-24-2016

After a rolling or express upgrade of HDP from 2.2.x to 2.4 (and I'm told also to 2.3.2 and 2.3.4) you may face some issues with your Kafka as I did. In my case HDP was upgraded from HDP-2.2.6 (Kafka-0.8.1) to HDP-2.4 (Kafka-0.9.0) using Ambari-2.2.1.1. Here are 4 items to check and fix if needed. After the upgrade and Kafka restart, brokers IDs are set to 1001, 1002, ... However, topics created before the upgrade point to brokers numbered 0, 1, ... For example (k81c was created before the upgrade, k9a after): $ ./kafka-topics.sh --zookeeper zk1.example.com:2181 --describe --topic k81c Topic:k81c PartitionCount:2 ReplicationFactor:2 Configs: Topic: k81c Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0 Topic: k81c Partition: 1 Leader: 0 Replicas: 1,0 Isr: 0 $ ./kafka-topics.sh --zookeeper zk1.example.com:2181 --describe --topic k9a Topic:k9a PartitionCount:2 ReplicationFactor:1 Configs: Topic: k9a Partition: 0 Leader: 1002 Replicas: 1002 Isr: 1002 Topic: k9a Partition: 1 Leader: 1001 Replicas: 1001 Isr: 1001 Newly created topics work, but old ones don't. A solution which worked for me was to change topic.id in newly created kafka-logs/meta.properties to old values. This has to be done on all volumes of all brokers. If your Kafka log volumes are, for example /data-01, ..., /data-06 you can change them by running $ sed -i 's/1001/0/' /data-*/kafka-logs/meta.properties # run this on each broker $ grep broker.id /data-*/kafka-logs/meta.properties # to confirm they changed It's a good idea to mark original broker IDs before the upgrade. They can be found in /etc/kafka/conf/server.properties. If you are running Kafka on a custom port different than default 6667, make sure the "listeners" property is set to your port. In the new version the "port" property is deprecated. If your port is for example 9092, set "listeners" to "PLAINTEXT://localhost:9092" If your Kafka is installed on dedicated nodes running only Kafka but not running Data node in a full-scale cluster which includes HDFS you may get an error saying that hadoop-client/conf cannot be found. This is possibly a bug in old Ambari used to install the original HDP and Kafka, as I found /etc/hadoop/conf on those broker nodes. A solution which worked for me was to create hadoop-client/conf structure by running the script below. By the way, on Kafka brokers running in a stand-alone cluster without HDFS I didn't have this error. hdpver=2.4.0.0-169 # set your HDP target version mkdir -p /usr/hdp/$hdpver/hadoop mkdir -p /etc/hadoop/$hdpver/0 ln -s /etc/hadoop/$hdpver/0 /usr/hdp/$hdpver/hadoop/conf hdp-select set hadoop-client $hdpver # /usr/hdp/current/hadoop-client -> /usr/hdp/$hdpver/hadoop cp /etc/hadoop/conf/* /etc/hadoop/$hdpver/0 # copy conf files from the previous location ln -sfn /usr/hdp/current/hadoop-client/conf /etc/hadoop/conf # update /etc/hadoop/conf symlink If your Kafka is not kerberized some Kafka scripts located in /usr/hdp/current/kafka-broker/bin/ won't work. To fix them comment out Kerberos related commands from them on all brokers. sed -i '/^export KAFKA_CLIENT_KERBEROS_PARAMS/s/^/# /' /usr/hdp/current/kafka-broker/bin/*.sh grep "export KAFKA_CLIENT_KERBEROS" /usr/hdp/current/kafka-broker/bin/*.sh # to confirm

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: How to increase datanode capacity ?

Re: how to do load balancing HS2 in a kerberos env...

Re: what is the expectation in HPDCP

Re: Manual Upgrade of Kafka

Re: Detected data dir(s) that became unmounted and...

Re: Yarn container size flexible to satisfy what a...

Re: Kafka cluster design

Re: RHEL Stand Alone IPA Ambari LDAP Integration

Re: I want to import certain tables from multiple ...

Troubleshooting Kafka Upgrade