About kushalbohra

kushalbohra · ‎05-17-2019

Greg's answer applies to you as well for incremental import/export operation. Also if you have some column in your source table which is an sequential index etc then you can be used for --split-by clause for distribution of data per mapper to scale parallelism and reduce runtime of app. My understanding is random numbers in a column if used for split key , can cause skew as well leading to different runtimes for map tasks.

kushalbohra · ‎11-20-2018

Try disabling vectorization for this job alone, I remember this being a bug in hive-1.2.1 i.e set hive.vectorized.execution.enabled=false; as workaround.

kushalbohra · ‎11-13-2018

Are you trying to setup hive metastore for the first time? Based on below it either looks like you are run init from hive client of a different version or you have run it multiple times that entry already committed into mysql db due to first run. I faced below problem when I did hive upgrade from 1.2.x to 2.0 as part of migration effort to Ambari from the Apache hive. Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed *** One thing I figured out is rather than running upgrade scripts individually better to do it using below approach. export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server [root@myhost ~]# /usr/hdp/current/hive-server2-hive2/bin/schematool -upgradeSchema -dbType mysql -userName hive_user -passWord 'XXXXXXXXXX' SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.5.5.0-157/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.5.5.0-157/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://XXXXX/hive_db Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive_user Starting upgrade metastore schema from version 1.2.0 to 2.1.0 Upgrade script upgrade-1.2.0-to-1.2.1000.mysql.sql Completed upgrade-1.2.0-to-1.2.1000.mysql.sql Upgrade script upgrade-1.2.1000-to-2.0.0.mysql.sql Completed upgrade-1.2.1000-to-2.0.0.mysql.sql Upgrade script upgrade-2.0.0-to-2.1.0.mysql.sql Completed upgrade-2.0.0-to-2.1.0.mysql.sql schemaTool completed note: If your hive metadata is important, before you do any upgrade , I strongly suggest to take backup of it using mysqldump utility.

kushalbohra · ‎11-13-2018

@Junfeng dfs.datanode.max.transfer.threads (i.e dfs.datanode.max.xcievers) and datanode memory both go together up. I feel anything in the range of 4096-8192 should be good enough. By the way, this configuration won't fix your missing block issue but is import to avoid exceptions like thread limit/quota exceeded, datanode running out of memory etc. In my previous comment, I forgot to mention that you need to tune ipc.maximum.data.length param in order to ensure your namenode receives the datanode block report. Currently, based on your error it seems it being rejected as its most likely crossing default limit of 64MB. Once you tune ipc.maximum.data.length missing blocks should go away most likely.

kushalbohra · ‎11-13-2018

For your first question explore param ipc.maximum.data.length that should help, on the other hand, value of 65536 for dfs.datanode.max.xcievers seems way high. Basically, I feel your datanode block reports are not reaching namenode because of length limitation and so namenode is missing enough blocks to exit safemode. Thus it makes sense why its reporting missing blocks upon forceful safemode exit. For namenode heap configuration visit https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html

kushalbohra · ‎11-12-2018

If you would like to just check if your hive server2 port is accepting connections or not you can run checks like below. nc -zv <hiveserver2 hostname> 10000 Connection to localhost 10000 port [tcp/ndmp] succeeded! hive metastore connections: netstat -all|grep 9083|wc -l 314 But note above doesn't guarantee whether your hive service is really running any tasks or not. Best way for figuring that out is to enable hive metrics which you can poll every few minutes as per your requirement. If you have Grafana setup it should help as well. ps: I'm not sure what distribution you are using, however, if you are on HDP, ambari already has run service check option which you can see if you can invoke it via rest api.

kushalbohra · ‎03-11-2018

For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/

kushalbohra · ‎06-22-2017

@Smart Solutions You probably should try below if using Tea engine in Hive. set hive.merge.tezfiles=true; set hive.merge.smallfiles.avgsize=128000000; set hive.merge.size.per.task=128000000;

kushalbohra · ‎04-30-2017

As mentioned before Hive version 0.12 ships varchar datatype, based on your cdh4.3.0 it looks like hive version is 0.10 (ref: https://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.3.0.releasenotes.html )

kushalbohra · ‎04-30-2017

Probably your version of hive is old. Check below. Varchar datatype was introduced in Hive 0.12.0 (HIVE-4844).

Online	Offline
Last Visited	‎10-30-2019 10:52 AM

Member Since	‎04-30-2017 04:31 AM
Last Visited	‎10-30-2019 10:52 AM
Posts	12
Kudos received	1

Cloudera Community

Re: A lot of blocks missing in HDFS

Re: Controlling Number of small files while insert...

Re: sqoop import to hive again stroing repeted rec...

Re: Getting Java.lang.ClassCastException: java.lan...

Re: schematool error

Re: A lot of blocks missing in HDFS

Re: A lot of blocks missing in HDFS

Re: test hive and spark connections using script?

Re: this version of libhadoop was built without sn...

Re: Controlling Number of small files while insert...

Re: Hive Partitioning

Re: Hive Partitioning