Member since
Kudos Received
My Accepted Solutions
Title | Views | Posted |
2657 | 12-25-2018 10:42 PM | |
12150 | 10-09-2018 03:52 AM | |
4192 | 02-23-2018 11:46 PM | |
1876 | 09-02-2017 01:49 AM | |
2196 | 06-21-2017 12:06 AM |
10:42 PM
1 Kudo
You are missing "m", like 8192m, without "m" the unit is byte, so only 8 kilobytes! Also note that 4G is usually enough for DN heap.
... View more
09:06 AM
Sorry for hard to understand message, try this: hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1/tmp/ hdfs:/cluster_2/tmp/ Note that you don't need port when using NN service name. Also I suggest to copy first a small file or directory in /tmp, like /tmp/mydir1, just create that dir and put a few files inside. Also remove '-update -p' during initial tests. Once it starts working you can try all that.
... View more
03:52 AM
Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster: -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2
... View more
10:21 AM
Have you resolved this? We configured Zeppelin ver. 0.7.0 using LdapRealm and roles are populated. The key properties are ldapRealm.groupObjectClass=group
These are defaults, and I don't have IPA server handy, so please do ldapsearch for one of your groups to confirm. You will see multiple group's classes in the output, select one used for users' groups. "memberAttribute" appears on the left side in the list of group members. And "GroupIdAttribute" is what LdapRealm will return as the group name instead of the "long" ldap name, without any OUs, DCs etc. Use those group names in your "rolesByGroup" and keep the capitals if any. Also restrict as much as possible your "groupSearchBase" and you can also try to increase ldapRealm.pageSize from default 100 to 200 or 300. If you still get no roles post your LdapRealm settings, and a few lines from your ldapsearch.
... View more
04:52 AM
I was able to set Livy queue by just setting livy.spark.yarn.queue=mylivyqueue in the Livy interpreter in Zeppelin and after restarting the Interpreter, Livy notebooks start runnin on that queue. By the way, my spark.yarn.queue=default.
... View more
01:03 AM
Generally speaking Hive view works. There are some boundary cases, specially when SQL comments are used when Hive fails with "cannot recognize input near '<EOF>' '<EOF>' '<EOF>' ", for example: select * from TMP_TBL1 LIMIT 20;
-- comment
OTOH, the following works: select * from TMP_TBL1
-- comment
This is in Hive-1.2, packaged for example with HDP-2.6.0.
... View more
11:46 PM
1 Kudo
Your connection string looks good. Make sure that in your topology file (go to your Knox server, and open /etc/knox/conf/topologies/default.xml to make sure Ambari did the right substitution) you have something like this: <service>
and that your HS2 has corresponding properties set: hive.server2.transport.mode=http
hive.server2.thrift.http.port=10001 Also check that "mypass" is the correct password for your gateway.jks.
... View more
08:29 AM
Hi Vaidya, if in Ambari Zookeeper summary page you see 6 ZK servers, then all 6 are running in the same ensemble and have one leader and 5 followers. You can check that using your favorite tool to run the same shell command on all 6 master nodes, and use "echo stat | nc localhost 2181 | grep Mode" as the command.
... View more
03:27 AM
I just tried and this also works on HDP-2.6.0 and I believe other 2.6.x. Instead of the jar in the article I used the latest version at And regarding the copy targets, it's enough to copy the assembly jar only to /usr/hdp/current/spark-client/lib on nodes where this directory already exists. I guess it can be also placed on hdfs, under /hdp but I haven't tried.
... View more
02:17 AM
1 Kudo
The best way to learn about various pe options is to run "hbase pe" without any options or commands: $ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation <OPTIONS> [-D<property=value>]* <command> <nclients>
... About nclients I already replied to you in another question: This is the level of parallelism used to run the specified command, in case of default MapReduce it means that 10*nclinents mappers will be started. About other options you asked, and a few others I use: rows Rows each client runs. Default: One million
columns Columns to write per row. Default: 1
presplit Create presplit table. Recommended for accurate perf analysis (see guide). Default: disabled
compress Compression type to use (GZ, LZO, ...). Default: 'NONE'
table Alternate table name. Default: 'TestTable'
bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize Pass value size to use: Default: 1024
Example: hbase pe --table=TestTable2 --compress=GZ --presplit=4 randomWrite 5 And of course, first run one of write commands, followed by some reads. And for the output, look for the following lines in the output of the MR job: HBase Performance Evaluation
Elapsed time in milliseconds=492463
Row count=1048560 You can also prepend "time" and run as "time hbase pe ...". For more details search the web, thought the results are segmented.
... View more