Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4058 | 08-20-2018 08:26 PM | |
| 1954 | 08-15-2018 01:59 PM | |
| 2380 | 08-13-2018 02:20 PM | |
| 4116 | 07-23-2018 04:37 PM | |
| 5026 | 07-19-2018 12:52 PM |
09-13-2016
07:08 PM
3 Kudos
I read the Atlas HA doc here http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-Trunk/bk_Ambari_Users_Guide/content/apache_atlas_high_availability.html Does atlas failover to secondary metastore? how many metastores are allowed?
... View more
Labels:
- Labels:
-
Apache Atlas
09-13-2016
03:31 PM
nevermind. I found the issue. hbase on sandbox must be up and running
... View more
09-13-2016
03:29 PM
Atlas UI is not coming up on my HDP 2.5 sandbox. I have started Atlas, log search, and kafka. I see the following error in the atlas log: Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Aug 24 22:48:43 UTC 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68415: row 'atlas_titan,,' on table 'hbase:meta' at region=hbase:meta,,1.158
8230740, hostname=sandbox.hortonworks.com,16020,1470279629513, seqNum=0 Does HBase has to be up and running? Does Atlas use a embbeded version or the one I start from ambari?
... View more
Labels:
09-13-2016
08:52 AM
4 Kudos
I often hear stories of wanting faster performance from Hadoop & spark without knowing basic statistics within ones environment. One of the first questions I ask is whether the hardware can perform at the level which is being expected. The software is still bound to the physics of the hardware. If your IO disk speed is 10MB per sec, Hadoop/Spark nor any other software will magically make that disk speed faster. Again we are bound to the physical limits of the hardware we choose. What makes Hadoop and other distributed processing engines amazing is its ability to add more "cheap" nodes to the cluster to increase performance. However we should be aware the maximum throughput per node. This will help level set expectations before committing to any SLA bound to performance. Typically I love to use the sysbench tool. SysBench is a modular & multi-threaded benchmark tool
for evaluating OS parameters ie. CPU, ram, IO, and mutex. I use sysbench prior to installing any software outside the kernel and pre/post Hadoop/Spark upgrades. Pre/post upgrades should not have any impact to your OS benchmarks but I play it safe. My neck is on the line when I commit to a SLA so I rather play it safe. The below tests I generally wrap in a shell script for ease of execution. For this article I call out each test for clarification. RAM test I start with testing RAM performance. This test can be used to benchmark sequential memory reads or writes. I test both. To test read performance I set memory block size to HDFS block size, number-threads = approx concurrency you expect on your cluster, and memory total size the avg size of each work load. sysbench --test=memory --memory-block-size=128M --memory-oper=read --num-threads=4 --memory-total-size=10G run To test write performance I set memory block size to HDFS block size, number of threads = approx concurrency you expect on your cluster, and memory total size the avg size of each work load. sysbench --test=memory --memory-block-size=128M --memory-oper=write --num-threads=4 --memory-total-size=10G run CPU test Next I grab the CPU performance numbers. This test consists in calculation of prime numbers up to a value specified
by the --cpu-max-primes option. I set the number of threads = approx concurrency you expect on your cluster. sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run IO test Lastly I fetch the IO performance numbers. When using fileio, you will need to create a set of test files to work on. It is recommended that the size is larger than the available memory to ensure that file caching does not influence the workload too much - https://wiki.gentoo.org/wiki/Sysbench#Using_the_fileio_workload Run this command to prepare a file which is larger then the available memory (Ram) on the box. In this example my box has 128GB of ram. I set the file size to 150G. I named the file here fileio. sysbench --test=fileio --file-total-size=150G prepare Next I run the io test using the file I just created (fileio). file-test-mode is the type of workload to produce. Possible values: seqwr
sequential write
seqrewr
sequential rewrite
seqrd
sequential read
rndrd
random read
rndwr
random write
rndrw
combined random read/write
init-rng - specifies if random numbers generator
should be initialized from timer before the
test start - http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf max-time - is the limit for the total execution time in seconds. 0 means unlimited. be careful. set a limit. max-request - is the limit for the total request. 0 means unlimited sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
... View more
Labels:
09-06-2016
11:51 PM
@Rendiyono Wahyu Saputro I recommend you look at storm vs spark in a different manner. if your stream response can handle some latency (as little as 1/2 a second) then spark may be the way to go. This is just my opinion as spark streaming is so darn easy. Storm is a POWERFUL engine with virtually zero latency. Storm has been clocked on millions of tuples per node per second. So you have to ask yourself if your use case needs zero latency or can you handle micro batch (spark streaming)
... View more
09-02-2016
07:57 PM
@Randy Gelhausen and @ssoldatov thank you for your responses.
... View more
09-02-2016
04:41 PM
Does phoenix update global index during bulk load? curious if this is supported and how it works.
... View more
Labels:
- Labels:
-
Apache Phoenix
09-02-2016
05:23 AM
@ARUN if this is for production then add hbase master HA node. for PQS you can install on 1 load to start with. once the load increase then you can add more for load balancing.
... View more
08-29-2016
05:06 PM
Are ranger polices enforced during HCatalog API? Does it make any difference if I am using embbed or remote metastore?
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Ranger
08-29-2016
05:05 PM
Are ranger polices enforced for HCatalog embedded metastore? I have a metastore for Hiveserver2 and another one for HCatalog. If I created a hive policy via ranger will the policy be enforced on both metastores?
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Ranger