Member since
08-15-2019
29
Posts
111
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
883 | 09-08-2017 11:30 PM | |
1254 | 06-08-2017 07:24 PM | |
2882 | 03-28-2017 05:20 PM | |
2312 | 03-17-2017 04:27 AM | |
1246 | 03-09-2017 11:48 PM |
04-04-2018
06:20 PM
4 Kudos
Starting Hive 3 we have following properties to limit the number of connections hive.server2.limit.connections.per.ipaddress hive.server2.limit.connections.per.user hive.server2.limit.connections.per.user.ipaddress https://issues.apache.org/jira/browse/HIVE-16917
... View more
03-13-2018
09:49 PM
5 Kudos
For full stack of error message from hive you could look into hiveserver log like @Slim was mentioning. Since hive interactive is enabled I believe you should look into hsihiveserver log on the node where HiveServer2 Interactive is running
... View more
01-09-2018
11:44 PM
4 Kudos
By connecting to web ui over port 10002 one could check the number of live sessions on that instance of hiveserver
... View more
09-08-2017
11:30 PM
8 Kudos
Not sure if your upgraded cluster and newly installed cluster have the same resources. Please try disabling and then enabling Interactive Query button in Hive configs page, also restart ambari recommended components if needed. This will make sure the newly calculated configs take effect and if memory requirements are met hive.llap.io.enabled will be set to true.
... View more
08-02-2017
06:10 PM
1 Kudo
One hack would be to kill the yarn application for a faster resource reclaim, retrieving the appid from RM UI yarn application -kill <appid>
... View more
07-23-2017
06:50 AM
4 Kudos
Consolidating below some of the errors thrown by Spark Thrift Server during SQL execution, that could be worked around by configuring certain parameters of spark-thrift-sparkconf.conf and hive-site.xml
Error 1:
Join condition is missing or trivial.Use the CROSS JOIN syntax to allow cartesian products between these relations.;
Resolution: spark.sql.crossjoin.enabled: true
Error 2:
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "eval(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB
Resolution: spark.sql.codegen.wholeStage : false
Error 3:
java.lang.OutOfMemoryError: Java heap space
Resolution: spark.driver.memory : 10g <to a higher-value>
spark.sql.ui.retainedExecutions: 5 <to some lower-value>
Error 4: org.apache.spark.SparkException: Exception thrown in awaitResult: (state=,code=0) Resolution: hive.metastore.try.direct.sql: false (in hive-site.xml)
To enable heap dump collection for spark driver and executors, for debugging out of memory errors
spark.driver.extraJavaOptions: '-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path-to-dump-file-location>'
spark.executor.extraJavaOptions: '-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path-to-dump-file-location>'
... View more
- Find more articles tagged with:
- configs
- Data Science & Advanced Analytics
- How-ToTutorial
- spark-thriftserver
Labels:
07-13-2017
10:39 PM
2 Kudos
If the issue is that the number of rows is too high starting beeline with beeline --incremental=true will be of help
... View more
06-29-2017
09:05 PM
Thanks @Kshitij Badani, will configure a local user as I am about to try some spark queries
... View more
06-29-2017
08:29 PM
1 Kudo
Thanks @dhanya it worked
... View more
06-29-2017
08:22 PM
4 Kudos
I am trying to follow the steps to get zeppelin running and connect UI. The home page seems to have a login button. Is there any default username/password or should i create one to get to the Main page mentioned here http://zeppelin.apache.org/docs/0.7.0/quickstart/explorezeppelinui.html
... View more
Labels:
- Labels:
-
Apache Zeppelin
06-09-2017
05:55 PM
1 Kudo
Was trying the above queries on HDP-2.6.1.0, with MR execution engine, getting below error message 0: jdbc:hive2://<hostname>:> explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;
Error: Error while compiling statement: FAILED: SemanticException [Error 10135]: Sort merge bucketed join could not be performed. If you really want to perform the operation, either set hive.optimize.bucketmapjoin.sortedmerge=false, or set hive.enforce.sortmergebucketmapjoin=false. (state=42000,code=10135) With tez i could see Map Join not SMB Map Join
... View more
06-08-2017
07:48 PM
1 Kudo
From hive manual it looks like the properties you are setting is for converting SMB join to map-join SMB https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-AutoConversiontoSMBMapJoin
... View more
06-08-2017
07:24 PM
6 Kudos
If the query is run over hive cli it should have worked. In case it is beeline or other clients accessing hiveserver you have to restart the server for property to get updated. If the cluster is managed via ambari it will be optimal to update it there since ambari will list the services that need a restart
... View more
05-08-2017
06:19 PM
6 Kudos
@Rakesh Nethi I am assuming you are using hiveserver2 and beeline cli. Try adding export HADOOP_CLIENT_OPTS=-XX:MaxPermSize=256m to hive-env.sh and restarting hiveserver2
... View more
05-02-2017
06:01 PM
1 Kudo
update is supported for ACID tables in hive, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
... View more
04-28-2017
11:12 PM
12 Kudos
Hive is having an improvised explain plan with the hive.explain.user set to true. Now the indentation and arrows clearly depict the execution flow. Retrieving the join order needed a lot of back and forth references in the old style which was made as easy as following the arrows now, even with the details of mappers and reducers.
Let us take a look at one sample explain plan and retrieve the join order
Plan optimized by CBO.
Vertex dependency in root stage
Map 2 <- Map 1 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3 vectorized, llap
File Output Operator [FS_101]
Group By Operator [GBY_100] (rows=1 width=736)
Output:["_col0","_col1","_col2","_col3"],aggregations:["min(VALUE._col0)","min(VALUE._col1)","min(VALUE._col2)","min(VALUE._col3)"]
<-Map 2 [SIMPLE_EDGE] llap
SHUFFLE [RS_45]
Group By Operator [GBY_44] (rows=1 width=736)
Output:["_col0","_col1","_col2","_col3"],aggregations:["min(_col1)","min(_col3)","min(_col14)","min(_col19)"]
Map Join Operator [MAPJOIN_92] (rows=80573328660391072 width=114)
Conds:MAPJOIN_91._col11, _col7=RS_41._col0, _col0(Inner),Output:["_col1","_col3","_col14","_col19"]
<-Map 9 [BROADCAST_EDGE] llap
BROADCAST [RS_41]
PartitionCols:_col0, _col0
Select Operator [SEL_23] (rows=1960358 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_86] (rows=1960358 width=104)
predicate:id is not null
TableScan [TS_21] (rows=1960358 width=104)
default@title,t,Tbl:COMPLETE,Col:NONE,Output:["id","title"]
<-Map Join Operator [MAPJOIN_91] (rows=73248479012737488 width=125)
Conds:MAPJOIN_90._col10=RS_38._col0(Inner),Output:["_col1","_col3","_col7","_col11","_col14"]
<-Map 8 [BROADCAST_EDGE] llap
BROADCAST [RS_38]
PartitionCols:_col0
Select Operator [SEL_20] (rows=2 width=40)
Output:["_col0"]
Filter Operator [FIL_85] (rows=2 width=40)
predicate:((UDFToString(role) = 'actress') and id is not null)
TableScan [TS_18] (rows=4 width=40)
default@role_type,rt,Tbl:COMPLETE,Col:NONE,Output:["id","role"]
<-Map Join Operator [MAPJOIN_90] (rows=66589524931926816 width=138)
Conds:MAPJOIN_89._col0, _col6=RS_35._col0, _col0(Inner),Output:["_col1","_col3","_col7","_col10","_col11","_col14"]
<-Map 7 [BROADCAST_EDGE] llap
BROADCAST [RS_35]
PartitionCols:_col0, _col0
Select Operator [SEL_17] (rows=764741 width=105)
Output:["_col0","_col1"]
Filter Operator [FIL_84] (rows=764741 width=105)
predicate:((UDFToString(gender) = 'f') and (name like '%An%') and id is not null)
TableScan [TS_15] (rows=3058966 width=105)
default@name,n,Tbl:COMPLETE,Col:NONE,Output:["id","name","gender"]
<-Map Join Operator [MAPJOIN_89] (rows=60535930444216152 width=152)
Conds:MAPJOIN_88._col7, _col4=RS_32._col0, _col1(Inner),Output:["_col0","_col1","_col3","_col6","_col7","_col10","_col11"]
<-Map 6 [BROADCAST_EDGE] llap
BROADCAST [RS_32]
PartitionCols:_col0, _col1
Select Operator [SEL_14] (rows=11636372 width=8)
Output:["_col0","_col1"]
Filter Operator [FIL_83] (rows=11636372 width=8)
predicate:(movie_id is not null and company_id is not null)
TableScan [TS_12] (rows=11636372 width=8)
default@movie_companies,mc,Tbl:COMPLETE,Col:NONE,Output:["movie_id","company_id"]
<-Map Join Operator [MAPJOIN_88] (rows=55032662847395656 width=167)
Conds:MAPJOIN_87._col0, _col2=RS_29._col0, _col2(Inner),Output:["_col0","_col1","_col3","_col4","_col6","_col7","_col10"]
<-Map 4 [BROADCAST_EDGE] llap
BROADCAST [RS_29]
PartitionCols:_col0, _col2
Select Operator [SEL_8] (rows=6110483 width=116)
Output:["_col0","_col1","_col2","_col4"]
Filter Operator [FIL_81] (rows=6110483 width=116)
predicate:((note) IN ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') and person_id is not null and person_role_id is not null and movie_id is not null and role_id is not null)
TableScan [TS_6] (rows=12220966 width=116)
default@cast_info,ci,Tbl:COMPLETE,Col:NONE,Output:["person_id","movie_id","person_role_id","note","role_id"]
<-Map Join Operator [MAPJOIN_87] (rows=50029692413268030 width=184)
Conds:(Inner),(Inner),Output:["_col0","_col1","_col2","_col3","_col4"]
<-Map 1 [BROADCAST_EDGE] llap
BROADCAST [RS_24]
Select Operator [SEL_2] (rows=701938 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_79] (rows=701938 width=104)
predicate:person_id is not null
TableScan [TS_0] (rows=701938 width=104)
default@aka_name,an,Tbl:COMPLETE,Col:NONE,Output:["person_id","name"]
<-Map 5 [BROADCAST_EDGE] llap
BROADCAST [RS_26]
Select Operator [SEL_11] (rows=34365 width=259)
Output:["_col0"]
Filter Operator [FIL_82] (rows=34365 width=259)
predicate:((UDFToString(country_code) = '[us]') and id is not null)
TableScan [TS_9] (rows=68731 width=259)
default@company_name,cn,Tbl:COMPLETE,Col:NONE,Output:["id","country_code"]
<-Select Operator [SEL_5] (rows=2074019 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_80] (rows=2074019 width=104)
predicate:id is not null
TableScan [TS_3] (rows=2074019 width=104)
default@char_name,chn,Tbl:COMPLETE,Col:NONE,Output:["id","name"]
1. Start reading from the bottom
2. Look for the keyword 'TableScan', the row below it will have the details of the tables being joined
3. Look for the keyword 'Join Operator', the indent of this line will tell us the tables that are part of this join
4. This joined table group will form candidate for another join if the line starts with '<-'
5. Similarly keep going to top to get the complete join order
The join order in the above example boils down to (title(t),(((movie_companies(mc),(cast_info(ci),(company_name(cn),aka_name(an),char_name(chn)))),name(n)),role_type(rt)))
... View more
- Find more articles tagged with:
- Data Processing
- hive2
- How-ToTutorial
- join
Labels:
04-28-2017
05:39 PM
1 Kudo
@Funamizu Koshi Could you please post the full stack of error message from hivemetastore.log, default location is /var/log/hive
... View more
04-28-2017
05:10 PM
7 Kudos
Looks like the data was removed from hdfs but no drop partition query was executed on hive side. The data being displayed from 'show partitions <table_name>' command retrieves data from hive metastore db, which has the information on table partition but not the real data, which is stored in hdfs. About the data missing from hdfs, it might be worth checking the hdfs-audit.log to trace the lifecycle of the directory/file
... View more
03-28-2017
05:34 PM
5 Kudos
@Karan Alang Try setting 'In-Memory Cache per Daemon' under Hive configs section in Ambari UI to a value around 20% of 'Memory per Daemon setting'. From error looks like both are set to same value now.
... View more
03-28-2017
05:20 PM
10 Kudos
@dvt isoft I am not sure which client you are using to connect. It might be worth checking if the same connection (so that same session) is being used to run successive queries or is it like client is kind of establishing new connection for each query. Otherwise, hive odbc driver is able to run select queries on temporary tables and as you already mentioned you don't need to enable it explicitly starting Hive 0.14.
... View more
03-17-2017
04:27 AM
9 Kudos
Check "Number of nodes used by Hive's LLAP" in hive configs (Advanced) section. From error looks like it is set 4 , in which case 4 node managers should be running in the cluster. Looks like the cluster has only 3.
... View more
03-09-2017
11:48 PM
7 Kudos
In View Config try setting sslTrustStore=/var/log/ambari-server/hive-next-view/hive.jks;trustStorePassword=******** as Hive Session param
... View more
03-02-2017
11:09 PM
2 Kudos
For getting the number of rows in result set you could use this query Select count(*) from (SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL )) AS table_1;
... View more
02-28-2017
05:35 AM
2 Kudos
One another way to backup/restore specific hive tables would be to use 'show create table' to back up DDL which could be used to recreate the table. Then the saved hdfs files could be dropped to warehouse dir to finish the table restore. beeline -u jdbc:hive2://<cluster-name>:<port#>/<db-name> --outputformat=csv -e "show create table <table-name>;" > <table_name>.ddl
... View more
11-09-2015
05:37 PM
1 Kudo
Performing a hdfs cat on stdout file inside the outputofthejob directory will give the output from hive. hdfs dfs -cat <location-of-outputofthejob>/stdout
... View more