Member since
04-30-2017
12
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5417 | 11-13-2018 08:09 AM | |
20277 | 06-22-2017 11:38 PM |
05-17-2019
09:12 PM
Greg's answer applies to you as well for incremental import/export operation. Also if you have some column in your source table which is an sequential index etc then you can be used for --split-by clause for distribution of data per mapper to scale parallelism and reduce runtime of app. My understanding is random numbers in a column if used for split key , can cause skew as well leading to different runtimes for map tasks.
... View more
11-20-2018
06:30 PM
Try disabling vectorization for this job alone, I remember this being a bug in hive-1.2.1 i.e set hive.vectorized.execution.enabled=false; as workaround.
... View more
11-13-2018
06:18 PM
Are you trying to setup hive metastore for the first time? Based on below it either looks like you are run init from hive client of a different version or you have run it multiple times that entry already committed into mysql db due to first run. I faced below problem when I did hive upgrade from 1.2.x to 2.0 as part of migration effort to Ambari from the Apache hive. Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed ***
One thing I figured out is rather than running upgrade scripts individually better to do it using below approach. export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server [root@myhost ~]# /usr/hdp/current/hive-server2-hive2/bin/schematool -upgradeSchema -dbType mysql -userName hive_user -passWord 'XXXXXXXXXX'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.5.0-157/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.5.0-157/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://XXXXX/hive_db Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive_user
Starting upgrade metastore schema from version 1.2.0 to 2.1.0
Upgrade script upgrade-1.2.0-to-1.2.1000.mysql.sql
Completed upgrade-1.2.0-to-1.2.1000.mysql.sql
Upgrade script upgrade-1.2.1000-to-2.0.0.mysql.sql
Completed upgrade-1.2.1000-to-2.0.0.mysql.sql
Upgrade script upgrade-2.0.0-to-2.1.0.mysql.sql
Completed upgrade-2.0.0-to-2.1.0.mysql.sql
schemaTool completed note: If your hive metadata is important, before you do any upgrade , I strongly suggest to take backup of it using mysqldump utility.
... View more
11-13-2018
08:09 AM
@Junfeng dfs.datanode.max.transfer.threads (i.e dfs.datanode.max.xcievers) and datanode memory both go together up. I feel anything in the range of 4096-8192 should be good enough. By the way, this configuration won't fix your missing block issue but is import to avoid exceptions like thread limit/quota exceeded, datanode running out of memory etc. In my previous comment, I forgot to mention that you need to tune ipc.maximum.data.length param in order to ensure your namenode receives the datanode block report. Currently, based on your error it seems it being rejected as its most likely crossing default limit of 64MB. Once you tune ipc.maximum.data.length missing blocks should go away most likely.
... View more
11-13-2018
06:21 AM
For your first question explore param ipc.maximum.data.length that should help, on the other hand, value of 65536 for dfs.datanode.max.xcievers seems way high. Basically, I feel your datanode block reports are not reaching namenode because of length limitation and so namenode is missing enough blocks to exit safemode. Thus it makes sense why its reporting missing blocks upon forceful safemode exit. For namenode heap configuration visit https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html
... View more
11-12-2018
10:53 PM
If you would like to just check if your hive server2 port is accepting connections or not you can run checks like below. nc -zv <hiveserver2 hostname> 10000
Connection to localhost 10000 port [tcp/ndmp] succeeded!
hive metastore connections: netstat -all|grep 9083|wc -l
314 But note above doesn't guarantee whether your hive service is really running any tasks or not. Best way for figuring that out is to enable hive metrics which you can poll every few minutes as per your requirement. If you have Grafana setup it should help as well. ps: I'm not sure what distribution you are using, however, if you are on HDP, ambari already has run service check option which you can see if you can invoke it via rest api.
... View more
03-11-2018
08:32 PM
For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/
... View more
06-22-2017
11:38 PM
1 Kudo
@Smart Solutions You probably should try below if using Tea engine in Hive. set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.merge.size.per.task=128000000;
... View more
04-30-2017
01:22 PM
As mentioned before Hive version 0.12 ships varchar datatype, based on your cdh4.3.0 it looks like hive version is 0.10 (ref: https://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.3.0.releasenotes.html )
... View more
04-30-2017
04:38 AM
Probably your version of hive is old. Check below. Varchar datatype was introduced in Hive 0.12.0 (HIVE-4844).
... View more