About PauloNeves

PauloNeves · ‎08-29-2022

I solved the problem turning off this option ("spark.sql.adaptive.enabled", "true")

PauloNeves · ‎08-18-2022

Now I'm trying to check the session configuration setting the property: 'spark.logConf' = "true" I believe that setting it "true" would make the session properties be saved in spark log and that I'd be able to check them issuing the command yarn logs -applicationId application_1660776720083_9876 > yarn.log but I can't find the session values in my yarn.log file. How would it be displayed? Am I doing something wrong? I'm using spark 2.3.4.

PauloNeves · ‎08-15-2022

I have a Spark sql query that works when I execute from inside a Jupyter Notebook that has a a PySpark kernel but fails when I execute it submitting to a Livy session. Usually there's no difference when I execute my queries both ways. I tried to get the spark session parameters with the command below and to guarantee that they are both the same: spark.sparkContext.getConf().getAll() I'm using spark 2.3. How can I debug this problem? I know that the query works using spark, but I can't make it work submitting with Livy. Here is the query: INSERT INTO sbx_xxxxx.dados_auxiliares_mediana_preco select p.ano, p.cod_cfi, p.preco, p.repeticoes, sum(p2.repeticoes) as qtd_antes from sbx_operacoes_digitais.dados_auxiliares_moda_preco p left join sbx_operacoes_digitais.dados_auxiliares_moda_preco p2 on p.ano = p2.ano and p.cod_cfi = p2.cod_cfi and p2.preco <= p.preco group by p.ano, p.cod_cfi, p.preco, p.repeticoes Here are the stack trace of the query returned by Livy: An error occurred while calling o99.sql. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange(coordinator id: 1868588554) hashpartitioning(ano#100, cod_cfi#101, 2001), coordinator[target post-shuffle partition size: 67108864] +- *(1) FileScan orc sbx_operacoes_digitais.dados_auxiliares_moda_preco[ano#100,cod_cfi#101,preco#103,repeticoes#104] Batched: true, Format: ORC, Location: InMemoryFileIndex[hdfs://BNDOOP03/corporativo/sbx_operacoes_digitais/dados_auxiliares_moda_preco_..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ano:int,cod_cfi:int,preco:decimal(14,2),repeticoes:decimal(14,2)> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doExecute(SortMergeJoinExec.scala:150) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:180) ... 23 more Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 58 more It looks like that the important part is a failed assertion, but It does not gives me any useful info: Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) Any help to teach me how to debut it is greatly appreciated

PauloNeves · ‎08-15-2022

Te problem improved upgrading the driver version. It still persists, but is rarer.

PauloNeves · ‎07-07-2021

Sorry, I don't know how to add files to this thread. The following types were refused: log, txt, and zip. I'd send it by email if someone asks.

PauloNeves · ‎07-07-2021

This is very weird. If I turn the log trace on, it works. If I turn the log at debug level or higher, it fails. Before sending the log files, here is some context. We don't have admin rights to our Windows Machines. The odbc drivers are installed as Admin by a script from a internal "App Store". After installing it, a bat file is executed to create the odbc data sources. The bat file basically executes this command: %windir%\syswow64\odbcconf configdsn "Cloudera ODBC Driver for Apache Hive" "DSN=Datalake|DESCRIPTION=Driver datalake|HiveServerType=2|ServiceDiscoveryMode=ZooKeeper|ZKNamespace=hiveserver2|Host=zookeeper01:2181,zookeeper02:2181,zookeeper03:2181|Port=10000|Schema=default|AuthMech=1|KrbRealm=S.NET|KrbServiceName=hive|ServicePrincipalCanonicalization=0|KrbHostFQDN=_HOST|GetTablesWithQuery=1|InvalidSessionAutoRecover=0|AutoReconnect=1" After the install, I open the Windows ODBC Admin and try to change the log level in "my datasource" -> Configure button -> Logging Options the option isn't persisted. If I try to configure it again the log level is off. I also can't edit the file "C:\Windows\odbc.ini". The sysadm gave admin permissions to my user. I edited the odbc.ini to add the recommended log config: [Driver] LogLevel=6 LogPath=C:\temp but it didn't saved any log info. So, with admin powers, I went to the ODBC admin, confligured the log level to trace at the "Logging Options" above and it worked. After editing the log level, these lines were automatically added to the ODBC.INI file: [Datalake] Driver32=C:\Program Files\Cloudera ODBC Driver for Apache Hive\lib\ClouderaHiveODBC64.dll These lines were not present before. The admin powers were removed from my user, but now I can change the "Logging Options" and they are persisted. ¯\_(ツ)_/¯ And now the weird behavior happens. If the log level is TRACE it works fine, if it is DEBUG or higher, it fails. Here how I test it: I open dbeaver I expand the Datalake connection so it connects. It opens a tree with a node HIVE below it I expand the HIVE tree and it lists all my databases names. So far, so fine. This always work. Now if I open a specific database, expand it and click to expand the table names, the debug version expands just the first table, the trace version display all the 3 tables. Here are the logs. The one with DEBUG ON and the one with TRACE ON. Each has 2 connections, since Dbeaver opens a separate connection for the metadata. Any help is greatly appreciated. I'll try to attach the log files using another browser in a reply.

PauloNeves · ‎07-06-2021

Thanks, I'm trying to do it. I believe it is the file C:\Windows\ODBC.INI and that I must add these lines at the end of it: [Driver] LogLevel=6 LogPath=/tmp I don't have write permission to this file, but I'll ask someone to do it.

PauloNeves · ‎07-05-2021

Weird. If I select a table some of the values come with the S1090 error: same query above return some valid values: But if I wait some minutes and try the query again. It works! Should I try some driver configuration?

PauloNeves · ‎07-05-2021

I'm connecting to Hive using Cloudera's ODBC driver version 2.6.9 and I'm getting a weird behavior that ruins the user experience. Any help to solve or debug this problem would be greatly appreciated. I'm using DBeaver to connect to it. All my databases just display a few of it's tables. Here is an example. When I open DBeaver, The dabase just shows the table `localizacao`: This database has 3 tables. If I run the command show tables from raw_aneel, DBeaver interface will display: The error message in english would be: org.jkiss.dbeaver.model.exec.DBCException: SQL Error [S1090]: [Microsoft][ODBC Driver Manager] Invalid string or buffer length Here is the complete stack trace displayed in the Error Log. The error message above is in Brazilian Portuguese: org.jkiss.dbeaver.model.exec.DBCException: SQL Error [S1090]: [Microsoft][ODBC Driver Manager] Comprimento inválido de cadeia de caracteres ou de buffer at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCAbstractValueHandler.fetchValueObject(JDBCAbstractValueHandler.java:55) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetDataReceiver.fetchRow(ResultSetDataReceiver.java:125) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:718) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:541) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:441) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:428) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:813) at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:3280) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:118) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:116) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:4624) at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63) Caused by: java.sql.SQLException: [Microsoft][ODBC Driver Manager] Comprimento inválido de cadeia de caracteres ou de buffer at sun.jdbc.odbc.JdbcOdbc.createSQLException(JdbcOdbc.java:6964) at sun.jdbc.odbc.JdbcOdbc.standardError(JdbcOdbc.java:7121) at sun.jdbc.odbc.JdbcOdbc.SQLGetDataString(JdbcOdbc.java:3914) at sun.jdbc.odbc.JdbcOdbcResultSet.getDataString(JdbcOdbcResultSet.java:5697) at sun.jdbc.odbc.JdbcOdbcResultSet.getString(JdbcOdbcResultSet.java:353) at sun.jdbc.odbc.JdbcOdbcResultSet.getObject(JdbcOdbcResultSet.java:1677) at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.getObject(JDBCResultSetImpl.java:627) at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCStringValueHandler.fetchColumnValue(JDBCStringValueHandler.java:52) at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCAbstractValueHandler.fetchValueObject(JDBCAbstractValueHandler.java:49) ... 14 more Before this stack trace there is a message `Can't read column 'tab_name' value`: Any Idea how to debug this?

PauloNeves · ‎12-17-2020

The command show databases list all databases in a Hive instance. The command list databases that I have and don't have access to. When I try to list the tables in a DB that I don't have access to, using the command show tables from forbidden_db, it returns an empty list. Which command would list all databases that I have access to at least one table?

Online	Offline
Last Visited	‎03-28-2023 04:29 PM

Member Since	‎12-17-2020 03:27 PM
Last Visited	‎03-28-2023 04:29 PM
Posts	11

Cloudera Community

Re: How to debug a SQL query that works using a sp...

Re: How to debug a SQL query that works using a sp...

Re: How to debug a SQL query that works using a sp...

How to debug a SQL query that works using a spark ...

Re: Hive ODBC fails to list all tables and returns...

Re: Hive ODBC fails to list all tables and returns...

Re: Hive ODBC fails to list all tables and returns...

Re: Hive ODBC fails to list all tables and returns...

Re: Hive ODBC fails to list all tables and returns...

Hive ODBC fails to list all tables and returns err...

How to list all Hive databases that I have access ...