Member since
09-24-2015
527
Posts
136
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2847 | 06-30-2017 03:15 PM | |
4243 | 10-14-2016 10:08 AM | |
9492 | 09-07-2016 06:04 AM | |
11532 | 08-26-2016 11:27 AM | |
1882 | 08-23-2016 02:09 PM |
05-26-2016
05:41 PM
Hi: I change my decimal(9,2) in hive to Double, and now its working
... View more
05-27-2016
08:18 AM
Fixed it. I removed the atlas.pid file and restarted the atlas-server. rm /var/run/atlas/atlas.pid
python /usr/hdp/current/atlas-server/bin/atlas_start.py
... View more
10-16-2016
02:34 PM
Hi: I have edit vi /usr/hdp/current/zeppelin-server/lib/conf/shiro.ini: [urls]
/api/version = anon
#/** = anon
/** = authcBasic
[users]
admin = admin
hdfs = hdfs
and restart zeppelin, but the login doesnt appear, just the anonimous user. I need anything else ???
... View more
05-20-2016
09:30 AM
1 Kudo
While Hive is perfect for analytical queries and is amazing for highly parallel workloads with lots of parallel queries, it is not as fast for small queries as traditional databases yet. You will not get queries faster than 2-3 seconds in total even under perfect circumstances. This is due to the architecture. Rule of thumb: - If Tez has to create a new session ( application master ), i.e. a query on a cold system, you can expect 10-15s pre time. You can fix this by pre-creating sessions. However that takes a bit of the cluster even if you don't need it. - If Tez has to create task containers you can expect 2-3s extra. Tez can reuse containers and there is also prewarm to precreate containers but it's tuning depends a lot on your usecase. If you don't know what you are doing you can make it worse - In general the Hiveserver has a bit of overhead ( around 1s ) for plan compilation communication with the metastore etc. So yes at the moment you will not get faster than 2-3s, realistically 4-5s. If you need sub second responses look at Phoenix for example. However things will soon get better for these short queries: - LLAP is already available as a tec preview ( long running processes that have an ORC data cache and remove the startup needs. - Hive will have an Hbase backed metastore which should speed up the hiveserver2 and more. In short look out for this space.
... View more
05-13-2016
01:51 PM
Hi finally the problem was about the directory permission /var/run/ambari-server on the namenode I did: chown -R ambari:ambari /var/run/ambari-server
... View more
05-05-2016
05:38 PM
Spark Thrift Server is similar to HiveServer2, you will install it on a node and use that as JDBC URL. You could have more than one in HA and load balancing scenarios but other than that, you will only have one of it.
... View more
05-05-2016
07:08 PM
Hi: finally its working with this code: Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))
library(SparkR)
#sparkR.stop()
sparkR.stop()
sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.driver.memory="4g"))
hiveContext <- sparkRHive.init(sc)
... View more
05-05-2016
07:14 AM
hbase is great for random key lookups, I've worked on a project where wordcloud powered by HBase worked just fine. If you have a dashboard, HBase or perhaps Phoenix works pretty well behind it.
... View more