Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3173 | 12-25-2018 10:42 PM | |
| 14192 | 10-09-2018 03:52 AM | |
| 4763 | 02-23-2018 11:46 PM | |
| 2481 | 09-02-2017 01:49 AM | |
| 2912 | 06-21-2017 12:06 AM |
04-29-2016
06:35 AM
3 Kudos
HI @simran kaur, to answer your questions There is no limit to "size", or capacity of the DN. It's only bound by the number of hard disk slots and capacity of your individual disks. If you have 12 slots and 6T per disk, then it's 72T per node. Datanode is a process managing HDFS files on a machine. You use only 1 DN on the same machine. You specify your DN directories, typically mounting points of your disks in dfs.datanode.data.dir. That's all, HDFS will take care of organizing data there. You configure block size as the dfs.blocksize property in HDFS. The default is 134217728 or 128M. The default of 128M is considered an optimal size for general-purpose clusters. If you keep many large files it can be increased, for example to 256M. And finally your DN capacity of only 991M indicates that something is wrong or you are running a Sandbox on a machine with small capacity. My capacity on my Sandbox is 45G.
... View more
04-28-2016
05:40 PM
Hi @vijaya inturi you can add multiple HS2 servers, and access them using Zookeeper discovery. On each call ZK will connect you to a random, alive HS2 instance. In a Kerberos environment the user who runs beeline needs a valid ticket, and has to insert the hive principal in the connect string. You can find beeline connection string samples here.
... View more
04-28-2016
05:25 PM
IIRC what's required is to have a correct requested output in a right place in HDFS. So you don't need to save it. However, you will save a lot of time if you save your Hive code in a script so that you can easily re-run it if needed. Good luck!
... View more
04-27-2016
12:59 PM
2 Kudos
For manual upgrade, with disclaimer provided by Emil, you can follow these steps Download the .repo file of HDP-2.4 for your OS (if you have no Internet access you need to setup a local repo). Links can be found here. Stop your Kafka broker Backup config files in /etc/kafka/conf Run "yum upgrade kafka" (or zypper if you are on Suse) Follow the steps here to configure and start Kafka. You can reuse your old configs, but will need some new ones like "listeners". Be sure to set log.dirs and zookeeper.connect to the values used before the upgrade Start kafka, either from the command-line (link above), it might also work from Ambari. If you face issues I've listed 4 known issues here: https://community.hortonworks.com/content/kbentry/29224/troubleshooting-kafka-upgrade.html
... View more
04-26-2016
05:31 AM
Not sure, but I guess restarting ambari-agent's did it.
... View more
04-26-2016
05:22 AM
1 Kudo
From my experience, in case of Spark, "Running containers" denotes the requested number of executors plus one (for the driver), while "Allocated Memory MB" denotes the allocated memory required to satisfy the request in a multiple of "minimum-allocation-mb". My example: minimum-allocation-mb=4096
num-executors: 100
executor-memory: 7G
spark.driver.memory=7G
---- Display ----
Running containers: 101
Allocated Mem MB=827392 (= 202 * 4096) 2*minimum-allocation-mb used to accommodate 7G plus the overhead, which in the latest versions of Spark is max(384, executor/driver-memory*0.1), in my case 700M.
... View more
04-26-2016
02:43 AM
Since you plan dedicated Kafka nodes in your "cluster for everything" then Kafka performance will be the same in comparison to a stand-alone Kafka cluster. However, it's good to have a dedicated Zookeeper quorum for Kafka, and in the first option Ambari currently doesn't support 2 ZK quorums per cluster, so you will need to install your ZK for Kafka manually. That's not so complicated, but if you go for a stand-alone Kafka solution, you can use Ambari to install and manage your ZK. So, my recommendation is to go for a stand-alone Kafka cluster.
... View more
04-26-2016
02:32 AM
1 Kudo
Your BIND dn is empty. I did Ambari sync with Free IPA a few months ago. I created a system account for binding to LDAP using ldapmodify as explained here, and used that for my BIND dn. Also check other properties set during "ambari-server setup-ldap" and make sure they are in sync with the ones set by IPA. You can use "ipa user-find" to inspect the structure of your users. To change some properties in Ambari you can re-run setup-ldap or set properties directly in /etc/ambari-server/conf/ambari.properties and restart ambari-server.
... View more
04-25-2016
09:53 AM
1 Kudo
Hi @omkar pathallapalli, a single Sqoop command can import only a single table from a given DB server. So, to import multiple tables from multiple servers you need a command, for example a Bash script like this for tbl in $(cat $4); do
sqoop import –-connect "jdbc:sqlserver://${1}:3464;databaseName=${2}" --username ${3} -P --table ${tbl} --target-dir sqimport
done And call it once per DB server providing each DB server FQDN, database name, user-name and a file listing the tables you want to import (one table per line). The script will prompt you for the password. You can modify target-dir and/or add more Sqoop properties including the number of mappers used for import, by default 4.
... View more
04-24-2016
08:30 AM
3 Kudos
After a rolling or express upgrade of HDP from 2.2.x to 2.4 (and I'm told also to 2.3.2 and 2.3.4) you may face some issues with your Kafka as I did. In my case HDP was upgraded from HDP-2.2.6 (Kafka-0.8.1) to HDP-2.4 (Kafka-0.9.0) using Ambari-2.2.1.1. Here are 4 items to check and fix if needed.
After the upgrade and Kafka restart, brokers IDs are set to 1001, 1002, ... However, topics created before the upgrade point to brokers numbered 0, 1, ... For example (k81c was created before the upgrade, k9a after): $ ./kafka-topics.sh --zookeeper zk1.example.com:2181 --describe --topic k81c
Topic:k81c PartitionCount:2 ReplicationFactor:2 Configs:
Topic: k81c Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0
Topic: k81c Partition: 1 Leader: 0 Replicas: 1,0 Isr: 0
$ ./kafka-topics.sh --zookeeper zk1.example.com:2181 --describe --topic k9a
Topic:k9a PartitionCount:2 ReplicationFactor:1 Configs:
Topic: k9a Partition: 0 Leader: 1002 Replicas: 1002 Isr: 1002
Topic: k9a Partition: 1 Leader: 1001 Replicas: 1001 Isr: 1001 Newly created topics work, but old ones don't. A solution which worked for me was to change topic.id in newly created kafka-logs/meta.properties to old values. This has to be done on all volumes of all brokers. If your Kafka log volumes are, for example /data-01, ..., /data-06 you can change them by running $ sed -i 's/1001/0/' /data-*/kafka-logs/meta.properties # run this on each broker
$ grep broker.id /data-*/kafka-logs/meta.properties # to confirm they changed
It's a good idea to mark original broker IDs before the upgrade. They can be found in /etc/kafka/conf/server.properties.
If you are running Kafka on a custom port different than default 6667, make sure the "listeners" property is set to your port. In the new version the "port" property is deprecated. If your port is for example 9092, set "listeners" to "PLAINTEXT://localhost:9092" If your Kafka is installed on dedicated nodes running only Kafka but not running Data node in a full-scale cluster which includes HDFS you may get an error saying that hadoop-client/conf cannot be found. This is possibly a bug in old Ambari used to install the original HDP and Kafka, as I found /etc/hadoop/conf on those broker nodes. A solution which worked for me was to create hadoop-client/conf structure by running the script below. By the way, on Kafka brokers running in a stand-alone cluster without HDFS I didn't have this error. hdpver=2.4.0.0-169 # set your HDP target version
mkdir -p /usr/hdp/$hdpver/hadoop
mkdir -p /etc/hadoop/$hdpver/0
ln -s /etc/hadoop/$hdpver/0 /usr/hdp/$hdpver/hadoop/conf
hdp-select set hadoop-client $hdpver # /usr/hdp/current/hadoop-client -> /usr/hdp/$hdpver/hadoop
cp /etc/hadoop/conf/* /etc/hadoop/$hdpver/0 # copy conf files from the previous location
ln -sfn /usr/hdp/current/hadoop-client/conf /etc/hadoop/conf # update /etc/hadoop/conf symlink
If your Kafka is not kerberized some Kafka scripts located in /usr/hdp/current/kafka-broker/bin/ won't work. To fix them comment out Kerberos related commands from them on all brokers. sed -i '/^export KAFKA_CLIENT_KERBEROS_PARAMS/s/^/# /' /usr/hdp/current/kafka-broker/bin/*.sh
grep "export KAFKA_CLIENT_KERBEROS" /usr/hdp/current/kafka-broker/bin/*.sh # to confirm
... View more
Labels: