Member since
07-19-2016
26
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2983 | 12-19-2017 08:44 PM |
10-03-2019
10:51 AM
Can you check value of hive.llap.io.enabled? It should be checked. hive.llap.io.enabled and hive.llap.io.memory.mode are the only two that will determine if cache is used or not. If both of these are set, you should make sure that the queries that you are running are hitting actually data and pruning isn't skipping everything (e.g. if queryset is select col from tbl where date="2019-01-01" and there isn't any data in 2019-01-01 then you wont ever get anything in your cache). James
... View more
05-17-2019
02:11 AM
@Rajkumar Singh - this is a useful article and I've probably passed it on to 5-6 customers. One thing to note though, this is no longer an issue with HDP3. 🙂 This is still very applicable and necessary for HDP2.x.
... View more
11-16-2018
12:42 AM
@carol elliott you need to set client option for unsupport terminal prior to launching via nohup: export HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" nohup beeline -f foo.sql -u ${jdbcurl} >> nohup_beelineoutput.out &
... View more
07-03-2018
11:34 AM
We had to add this to custom hive-site (custom hive-env does not work). Once added to custom hive-site, this worked fine. James
... View more
05-28-2018
04:02 PM
@Sergey Shelukhin The "Set Recommended" picks 12288Mb for overhead. My MX is 32Gb and Cache is 16Gb. 6% would be 1.92Gb, so I was going to go with 2Gb based on this article. Maybe overhead requirements have changed with 2.6.4 relative to this article? Thanks!!
... View more
12-19-2017
11:12 PM
you should be able to use show table extended partition to see if you can get info on it and not try to open anyone who is zero bytes. Like this: scala> var sqlCmd="show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01')" sqlCmd: String = show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01') scala> var partitionsList=sqlContext.sql(sqlCmd).collectAsList partitionsList: java.util.List[org.apache.spark.sql.Row] = [[mydb,mytable,false,Partition Values: [date_time_date=2017-01-01] Location: hdfs://mycluster/apps/hive/warehouse/mydb.db/mytable/date_time_date=2017-01-01 Serde Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Storage Properties: [serialization.format=1] Partition Parameters: {rawDataSize=441433136, numFiles=1, transient_lastDdlTime=1513597358, totalSize=4897483, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numRows=37825} ]] Let me know if that works and you can avoid the 0 byter's with such or if you still get null pointer.. James
... View more
12-19-2017
08:44 PM
2 Kudos
This is a way in lieu of not having Ranger to set the permissions at the HDFS level of a database (and have it carry to all tables and files in that db) or directory (and have it carry to all subdirectories and files in that directory). For example, after setting that parameter, you can do a hadoop fs -chown myuser /apps/hive/warehouse/mydb.db; hadoop fs -chmod 700 /apps/hive/warehouse/mydb.db and now myuser is the only one who can see or do things with that db as hive.warehouse.subdir.inherit.perms=true causes anything created underneath it to inherit the same permissions and ownership as the parent. While this works, Ranger is the way to go though.
... View more
12-19-2017
03:38 AM
@Benakaraj KS Go to 2.6.3 and set spark.sql.hive.convertMetastoreOrc=true and spark.sql.orc.enabled=true. You'll get a 3x 🙂
... View more
07-24-2017
10:28 PM
Hi Matt. With smbjoin though, we have to have both tables bucketed, I believe and I want to partition first. But actually, I found even better than sorting on insert sub-select; i created partitioned by (date_time_date) clustered by (key) sorted by (key) and that looks to have pulled 200 milli's (will run some more tests to confirm). This would even be more preferred because with hive.enforce.bucketing=true and hive.enforce.sorting=true you don't have to do a sort on your inserts and we still retain the partition by on the date...i will update thread if proves out with the additional testing.
... View more
07-24-2017
03:23 AM
@gopal, a colleague of yours and mine were discussing potential ways to reduce response time by using an order by (or sort by) in the select part of the insert sub-select during population of the ORC table. i.e. I partition by date, but i do order by a column that's commonly used for joins. is there a benefit to doing this? insert into orc_table select col1, col2....col n, partition_col from text_table order by joinKey; Thanks..
... View more
Labels:
- Labels:
-
Apache Hive
06-26-2017
04:42 PM
Thank you - very useful!
... View more
06-08-2017
01:14 PM
4 Kudos
i forgot to update this. Actually, the LLAP and the cache was setup correctly. In the queries taken from the Teradata, each of the 200k queries performs several joins and one of the joins was on a column that was a null. So, result set was null. Side benefit though was a nice gain in knowledge on all the nobs/levers of this product. And it is a very nice product. My experience is in Teradata/Netezza/Hawq and I've found LLAP to be a clear winner in replacing those. Very fast, very robust. Couple of Notes on the things that mattered: -Use ORC (duh) -Get Partitioned by / Clustered by right (use hive --orcfiledump as shown in Sergey's slides to make sure you get 100k records per orc file) -Get number of nodes / appmasters right -Use local Metastore -Get heap's right -Get yarn capacity scheduler settings right -Increase executors -(probably not supported) clone IS2 jvm's for more throughput if needed. Pull-up hive ui or grafana and sit back, smile and enjoy watching the transactions fly.
... View more
06-08-2017
01:04 PM
I agree. HDP 2.6 is very solid. We have thoroughly tested against 200k queries/hr (ootb is about 40k/hr, need to go to local metastore, increase appmaster's, etc. to increase). HDP 2.5.3 was solid too, though technically LLAP was tech preview in that release.
... View more
03-25-2017
06:18 AM
@Benjamin Leonhardi , on slide 24 you notate that a small stripe size indicates a memory problem during load. Do you know what memory problem that would be? I have ~ 3500 records on the stripe and was just wondering where I should look. Thanks!
... View more
02-21-2017
11:13 PM
yes - it is in advanced hive-interactive-site. Also notice, there is no set-recommended button (blue 3/4 circle arrow). Thanks again Constantin.
... View more
02-21-2017
10:18 PM
I should also add everything looks correct, starts correctly and runs well. It just isn't using the io cache. I do have millions of these messages in yarn timeline server log, but from looking at the orc writer code it doesn't look like orc writers uses timeline server: 2017-02-21 00:20:36,581 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(207)) - Error putting entity: dag_1487651061826_0016_921 (TEZ_DAG_ID): 6 Other than that, everything is clean. Thanks again, I appreciate it.
... View more
02-21-2017
10:12 PM
hivellapiomemorymode.png
Yeah, I thought so too. It was literally blank (see screenshot of first ambari version post enabling interactive query). Also odd was the hive.llap.daemon.vcpus.per.instance, it was a variable rather than the number. This generated a WARN at start ("WARN
conf.HiveConf: HiveConf hive.llap.daemon.vcpus.per.instance expects INT type
value"). So I put 32 in for the vcpu's and I put cache for hive.llap.io.memory.mode. They saved correctly to hive-site.xml. The third anamoly is I get is a message that says "WARN
conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not
exist". This comes from ambari and I can remove from hive-site.xml, but ambari will put it back. Everything else was as I would expect. hive.llap.io.enabled is true. According to Grafana, I have a 12Gb cache, but there isn't anything in it. Thanks for your help, I appreciate it..
... View more
02-21-2017
02:36 PM
Update for those out there thinking of answer, (and a cheat sheet for those browsing of good parameters to set - I have tested each individually and have response time down to 630ms and throughput to 40 queries/sec for a 1Tb dataset, but need to get the iocache going to get to the next level and improve throughput (q/sec)): The io cache is created (12Gb), but is never used. I am using OOTB LLAP settings except for below changes, all of which were positive in driving down response time and increasing throughput, except a few were neutral: -increased mem for yarn, llap heap, llap concurrency -disabled CBO (it's adding time to query
during creating and submitting plan portion of query) -set hive.llap.iomemory.mode=cache (OOTB this setting was null) -increased /proc/sys/net/core/somaxconn to 4096 -increase metastore heap -increased yarn threads
(yarn.resourcemanager.amlauncher.thread-count) -increased Yarn Resource Manager heartbeat interval (tez.am-rm.heartbeat.interval-ms.max) -set hive.llap.daemon.vcpus.per.instance to 32, previously it is unrecognized
variable (ambari bug) and gave message in ambari startup of "WARN
conf.HiveConf: HiveConf hive.llap.daemon.vcpus.per.instance expects INT type
value" -added additional HS2 instance. -added the 4 hive-conf options (hive.execution.mode, hive.llap.execution.mode, hive.lllap.io.enabled, hive.llap.daemon.service.host.) as additional properties in HS2 custom site. -increased HDFS heap Version of HDP is 2.5.3.0-37 Version of Hive is 2.1.0.2.5.3.0-37
... View more
02-21-2017
07:21 AM
1 Kudo
I've setup LLAP and it is working fine, but it is not using the IO
Cache. I've set the below in both the CLI and HS2, but Grafana shows no
cache used (and HDFS name node is very busy keeping the edits). Any
ideas on what I might be missing? --hiveconf
hive.execution.mode=llap --hiveconf
hive.llap.execution.mode=all --hiveconf
hive.llap.io.enabled=true --hiveconf
hive.llap.daemon.service.hosts=@llap0
... View more
- Tags:
- Data Processing
- llap
07-19-2016
06:09 PM
Apologize for the dumb question.. Is there a command line view to get hbase files local %? Or, alternatively, a way to refresh the widget on Ambari on demand? thanks..
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase