Member since
07-25-2016
28
Posts
74
Kudos Received
0
Solutions
09-29-2017
08:33 AM
2 Kudos
PROBLEM: Hive metastore isn’t updating for tables after insert into the table. This issue is observed with improper ‘numRows’ value with the describle formatted <tb_name> RESOLUTION: Hive stats are autogathered properly till an 'analyze table [tablename] compute statistics for columns' is run. Then it does not auto-update the stats till the command is run again. This is due to a known issue https://issues.apache.org/jira/browse/HIVE-12661 and has been fixed in HDP-2.5.0
... View more
Labels:
09-29-2017
08:18 AM
2 Kudos
PROBLEM: While trying to execute a simple pyspark script that is trying to select data from Hive transactional table stored in ORC format, customer is facing following exception. java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:311) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2378) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2384) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2120) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2119) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810) at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0000045_0000" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 50 more Caused by: java.lang.NumberFormatException: For input string: "0000045_0000" ROOT CAUSE: This is unsupported technology. Here is a quick link to the Apache jira https://issues.apache.org/jira/browse/SPARK-15348 RESOLUTION: Currently this can be resolved by using HIVE-LLAP from SPARK-LLAP. The feature is however still in Technical Preview and not made GA. There is no roadmap available for this issue yet from Hortonworks.
... View more
Labels:
04-02-2018
03:31 AM
Oracle LONG columns are a nasty biz. They work with practically no other SQL or PL/SQL data types. Depending on the actual version of the DBMS, VARCHAR2 values can get treated as LONG and cause errors like this.
... View more
09-29-2017
08:01 AM
2 Kudos
This is an unsupported technology and a concept which hasn't been explored yet. There's no real modification time concept in object stores. It has just creation time, which is that of the observed time at the far end. If you upload a file to a remote timezone, you may get that as your time. The underlying issue here is not a bug. It is just a feature that distcp -update relies on using file checksums for comparing HDFS files, and (a) not all stores export their checksum through the Hadoop API (WASB does, s3a doesn't yet). In addition, because the checksums are different between blobstores and HDFS, you can't use checksum difference as a cue for files being changed. Note that this also occurs when trying to copy between HDFS encryption zones, as the checksums of the encrypted files will differ.
... View more
Labels:
06-30-2017
09:30 PM
6 Kudos
SYMPTOM : Hive query with group by clause stuck in reducer phase for a very long time having large amount of data ROOT CAUSE: This happens in the case when GROUPBY clause is not optimized. By default Hive puts the data with the same group-by keys to the same reducer. If the distinct value of the group-by columns has data skew, one reducer may get most of the shuffled data and will be stuck for a very long time on this reducer. WORKAROUND: In this case increasing the tez container memory will not help. We can avoid data skewness using the following properties before running the query, >set hive.tez.auto.reducer.parallelism=true
>set hive.groupby.skewindata=true ;
>set hive.optimize.skewjoin=true;
... View more
Labels:
06-30-2017
03:12 PM
6 Kudos
SYMPTOM Select statement fails for view with different ordering FAILING QUERIES: select id, dept, emp, fname from testview order by id, dept; select id, emp, dept, fname from testview order by id, dept; select emp, dept, id, fnamefrom testview order by id, dept; SUCCESSFUL QUERIES: select emp, fname, id, dept from testview order by id, dept; select emp, citystate, fname, dept from testview order by id, dept; select emp, fname, dept, id from testview order by id, dept; EXCEPTION: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating VALUE._col1
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:86)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
... 17 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryHiveVarchar.init(LazyBinaryHiveVarchar.java:47)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81)
... 18 more
2017-05-30 20:12:32,035 [INFO] [TezChild] |exec.FileSinkOperator|: FS[1]: records written - 0
2017-05-30 20:12:32,035 [INFO] [TezChild] |exec.FileSinkOperator|: RECORDS_OUT_0:0,
ROOT CAUSE The exception is due to mismatch in the serialization and deserialization on hive table backed upon sequenceinput/sequenceinput file format. The serialization by LazyBinarySerDe from previous MapReduce job used different order of columns. When the current MapReduce job deserialized the intermediate sequence file generated by previous MapReduce job, it will get corrupted data from the deserialization using wrong order of columns by LazyBinaryStruct. The unmatched columns between serialization and deserialization is caused by SelectOperator's Column Pruning ColumnPrunerSelectProc. WORKAROUND 1] Create an orc table from sequence table as follows
create table test_orc stored as orc as select * from testtable; 2] create table view. REFERENCE: https://issues.apache.org/jira/browse/HIVE-14564
... View more
Labels:
06-30-2017
07:40 AM
8 Kudos
SYMPTOM CREATE EXTERNAL TABLE test(
id STRING,
dept STRING)
row format delimited
fields terminated by ','
location '/user/hdfs/testdata/';
ROOT CAUSE The files under location provided while creating table are structured in following way /user/hdfs/testdata/1/test1
/user/hdfs/testdata/2/test2
/user/hdfs/testdata/3/test3
/user/hdfs/testdata/4/test4
RESOLUTION To make the subdirectories accessible set the following two properties before executing the create table statement set mapred.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
... View more
Labels:
06-25-2017
02:26 AM
8 Kudos
SYMPTOM : => This problem occurs in case of a partitioned table without any null partitions and contains approximately more than 600 columns in the table => Following stacktrace is observed in hive metastore logs Nested Throwables StackTrace:
org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO "PARTITION_PARAMS" ("PARAM_VALUE","PART_ID","PARAM_KEY") VALUES (?,?,?)
at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1056)
at org.datanucleus.store.rdbms.scostore.JoinMapStore.put(JoinMapStore.java:307)
at org.datanucleus.store.types.wrappers.backed.Map.put(Map.java:653)
at org.apache.hadoop.hive.common.StatsSetupConst.setColumnStatsState(StatsSetupConst.java:285)
at org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6237)
at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
at com.sun.proxy.$Proxy10.updatePartitionColumnStatistics(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartitonColStats(HiveMetaStore.java:4596)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:5953)
at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:139)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:97)
at com.sun.proxy.$Proxy12.set_aggr_stats_for(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:11062)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:11046)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: ERROR: value too long for type character varying(4000)
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:393)
at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:431)
at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1047)
... 30 more
ROOT CAUSE: => Analyze table query updates the statistics in the metastore database => Metastore database has a limitation (4000) for the number of the characters that can be updated in the the PARTITION_PARAMS.PARAMS_VALUE => Hence, too many number of columns causes a limitation on the number of columns that can be updated WORKAROUND: Try increasing the column width for "PARTITION_PARAMS.PARAM_VALUE" column in metastore database. STEPS: 1] Stop metastore/HS2 2] Back up the DB 3] Try to increase the column width to a reasonable value. In case of Postgres database use the following command, ALTER TABLE PARTITION_PARAMS ALTER COLUMN PARAM_VALUE TYPE varchar(64000); 4] Start metastore/HS2 again.
... View more
Labels:
06-24-2017
10:06 PM
7 Kudos
SYMPTOM: Incorrect status shown for the DAGs in Tez UI ROOT CAUSE This is a known issue (https://issues.apache.org/jira/browse/TEZ-3656). It will only happen for the killed applications or if there was a failure to write into Application Timeline Server. It should not cause any issues, except for the wrong status for the DAG in the TezUI. RESOLUTION: This is fixed in HDP 2.6.1 release
... View more
Labels:
06-24-2017
09:45 PM
7 Kudos
PROBLEM DEFINITION: CREATE TABLE DT(Dérivation string, Pièce_Générique string); Throws ParserException Error ROOT CAUSE/ WORKAROUND: Hive database name, table name and/or column names cannot contain Unicode string. However, Hive supports UTF-8 and Unicode string for only the table data/comments. LINKS: https://cwiki.apache.org/confluence/display/Hive/User+FAQ
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »