About akapratwar

akapratwar · ‎09-29-2017

PROBLEM: While trying to execute a simple pyspark script that is trying to select data from Hive transactional table stored in ORC format, customer is facing following exception. java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:311) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2378) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2384) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2120) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2119) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810) at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0000045_0000" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 50 more Caused by: java.lang.NumberFormatException: For input string: "0000045_0000" ROOT CAUSE: This is unsupported technology. Here is a quick link to the Apache jira https://issues.apache.org/jira/browse/SPARK-15348 RESOLUTION: Currently this can be resolved by using HIVE-LLAP from SPARK-LLAP. The feature is however still in Technical Preview and not made GA. There is no roadmap available for this issue yet from Hortonworks.

akapratwar · ‎09-29-2017

PROBLEM: In hive, on oracle metastore following error is observed for table creation, java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column ROOT CAUSE: Issue looks to be due to large number of columns in this table. RESOLUTION: Currently hive will we store column stats in a table while we store the accuracy in another table (table properites). The best way is to store both the column stats and its accuracy in the same table. This involves modification of schema. As a workaround we can change the column type to CLOB.

akapratwar · ‎09-29-2017

While trying to perform an import/export data from/to MS Parallel Data Warehouse following error is observed Sqoop Command Output: sqoop export --connect "jdbc:sqlserver://<DB_URL>:<PORT>;database=<DB_NAME>; --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --username <USERNAME> --password ***--table "<TB_NAME>" --input-fields-terminated-by ',' --export-dir <EXPORT_DIR> -m 1 /usr/hdp/2.5.3.0-37//sqoop/conf/sqoop-env.sh: line 23: HADOOP_CLASSPATH=${hcat -classpath}: bad substitution Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.5.3.0-37/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.5.3.0-37/hive/lib/phoenix-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/09/12 13:14:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37 17/09/12 13:14:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/09/12 13:14:07 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 17/09/12 13:14:07 INFO manager.SqlManager: Using default fetchSize of 1000 17/09/12 13:14:07 INFO tool.CodeGenTool: Beginning code generation 17/09/12 13:14:08 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Setting IsolationLevel to ReadCommitted is not supported. ROOT CAUSE: This is because databases like PDW have don't accept READ COMMITTED isolation level. Relaxed isolation fails in this case. The isolation for metadata queries doesn't set the transaction isolation level or make it mutable. This issue has been fixed in following apache jira https://issues.apache.org/jira/browse/SQOOP-2349 RESOLUTION: This issue can be fixed by upgrading to HDP 2.5.5 version and above.

akapratwar · ‎09-29-2017

This is an unsupported technology and a concept which hasn't been explored yet. There's no real modification time concept in object stores. It has just creation time, which is that of the observed time at the far end. If you upload a file to a remote timezone, you may get that as your time. The underlying issue here is not a bug. It is just a feature that distcp -update relies on using file checksums for comparing HDFS files, and (a) not all stores export their checksum through the Hadoop API (WASB does, s3a doesn't yet). In addition, because the checksums are different between blobstores and HDFS, you can't use checksum difference as a cue for files being changed. Note that this also occurs when trying to copy between HDFS encryption zones, as the checksums of the encrypted files will differ.

akapratwar · ‎06-30-2017

SYMPTOM : Hive query with group by clause stuck in reducer phase for a very long time having large amount of data ROOT CAUSE: This happens in the case when GROUPBY clause is not optimized. By default Hive puts the data with the same group-by keys to the same reducer. If the distinct value of the group-by columns has data skew, one reducer may get most of the shuffled data and will be stuck for a very long time on this reducer. WORKAROUND: In this case increasing the tez container memory will not help. We can avoid data skewness using the following properties before running the query, >set hive.tez.auto.reducer.parallelism=true >set hive.groupby.skewindata=true ; >set hive.optimize.skewjoin=true;

akapratwar · ‎06-30-2017

SYMPTOM Select statement fails for view with different ordering FAILING QUERIES: select id, dept, emp, fname from testview order by id, dept; select id, emp, dept, fname from testview order by id, dept; select emp, dept, id, fnamefrom testview order by id, dept; SUCCESSFUL QUERIES: select emp, fname, id, dept from testview order by id, dept; select emp, citystate, fname, dept from testview order by id, dept; select emp, fname, dept, id from testview order by id, dept; EXCEPTION: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating VALUE._col1 at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:86) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) ... 17 more Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryHiveVarchar.init(LazyBinaryHiveVarchar.java:47) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81) ... 18 more 2017-05-30 20:12:32,035 [INFO] [TezChild] |exec.FileSinkOperator|: FS[1]: records written - 0 2017-05-30 20:12:32,035 [INFO] [TezChild] |exec.FileSinkOperator|: RECORDS_OUT_0:0, ROOT CAUSE The exception is due to mismatch in the serialization and deserialization on hive table backed upon sequenceinput/sequenceinput file format. The serialization by LazyBinarySerDe from previous MapReduce job used different order of columns. When the current MapReduce job deserialized the intermediate sequence file generated by previous MapReduce job, it will get corrupted data from the deserialization using wrong order of columns by LazyBinaryStruct. The unmatched columns between serialization and deserialization is caused by SelectOperator's Column Pruning ColumnPrunerSelectProc. WORKAROUND 1] Create an orc table from sequence table as follows create table test_orc stored as orc as select * from testtable; 2] create table view. REFERENCE: https://issues.apache.org/jira/browse/HIVE-14564

akapratwar · ‎06-30-2017

SYMPTOM CREATE EXTERNAL TABLE test( id STRING, dept STRING) row format delimited fields terminated by ',' location '/user/hdfs/testdata/'; ROOT CAUSE The files under location provided while creating table are structured in following way /user/hdfs/testdata/1/test1 /user/hdfs/testdata/2/test2 /user/hdfs/testdata/3/test3 /user/hdfs/testdata/4/test4 RESOLUTION To make the subdirectories accessible set the following two properties before executing the create table statement set mapred.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true;

akapratwar · ‎06-25-2017

SYMPTOM : => This problem occurs in case of a partitioned table without any null partitions and contains approximately more than 600 columns in the table => Following stacktrace is observed in hive metastore logs Nested Throwables StackTrace: org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO "PARTITION_PARAMS" ("PARAM_VALUE","PART_ID","PARAM_KEY") VALUES (?,?,?) at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1056) at org.datanucleus.store.rdbms.scostore.JoinMapStore.put(JoinMapStore.java:307) at org.datanucleus.store.types.wrappers.backed.Map.put(Map.java:653) at org.apache.hadoop.hive.common.StatsSetupConst.setColumnStatsState(StatsSetupConst.java:285) at org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6237) at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103) at com.sun.proxy.$Proxy10.updatePartitionColumnStatistics(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartitonColStats(HiveMetaStore.java:4596) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:5953) at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:139) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:97) at com.sun.proxy.$Proxy12.set_aggr_stats_for(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:11062) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:11046) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.postgresql.util.PSQLException: ERROR: value too long for type character varying(4000) at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363) at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:393) at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:431) at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1047) ... 30 more ROOT CAUSE: => Analyze table query updates the statistics in the metastore database => Metastore database has a limitation (4000) for the number of the characters that can be updated in the the PARTITION_PARAMS.PARAMS_VALUE => Hence, too many number of columns causes a limitation on the number of columns that can be updated WORKAROUND: Try increasing the column width for "PARTITION_PARAMS.PARAM_VALUE" column in metastore database. STEPS: 1] Stop metastore/HS2 2] Back up the DB 3] Try to increase the column width to a reasonable value. In case of Postgres database use the following command, ALTER TABLE PARTITION_PARAMS ALTER COLUMN PARAM_VALUE TYPE varchar(64000); 4] Start metastore/HS2 again.

akapratwar · ‎06-24-2017

SYMPTOM: Incorrect status shown for the DAGs in Tez UI ROOT CAUSE This is a known issue (https://issues.apache.org/jira/browse/TEZ-3656). It will only happen for the killed applications or if there was a failure to write into Application Timeline Server. It should not cause any issues, except for the wrong status for the DAG in the TezUI. RESOLUTION: This is fixed in HDP 2.6.1 release

akapratwar · ‎06-24-2017

PROBLEM DEFINITION: CREATE TABLE DT(Dérivation string, Pièce_Générique string); Throws ParserException Error ROOT CAUSE/ WORKAROUND: Hive database name, table name and/or column names cannot contain Unicode string. However, Hive supports UTF-8 and Unicode string for only the table data/comments. LINKS: https://cwiki.apache.org/confluence/display/Hive/User+FAQ

Online	Offline
Last Visited	‎06-21-2019 04:30 PM

Member Since	‎07-25-2016 07:47 PM
Last Visited	‎06-21-2019 04:30 PM
Posts	28
Kudos received	74

Cloudera Community

Error while trying to execute a simple pyspark scr...

Hive table creation error

Sqoop to MS PDW(Parallel Data Warehouse) issue

Can Distcp consider modification times when using ...

Hive query with group by clause stuck in reducer p...

SELECT queries fail on Hive table view

Hive create external table fails to load data

ANALYZE table COMPUTE STATISTICS for COLUMNS fail...

Tez UI shows incorrect status in HDP 2.5.3

Can we create hive table/ database name/ column na...