Member since
12-17-2020
11
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4783 | 08-29-2022 03:10 PM |
08-29-2022
03:10 PM
I solved the problem turning off this option ("spark.sql.adaptive.enabled", "true")
... View more
08-18-2022
02:08 PM
Now I'm trying to check the session configuration setting the property: 'spark.logConf' = "true" I believe that setting it "true" would make the session properties be saved in spark log and that I'd be able to check them issuing the command yarn logs -applicationId application_1660776720083_9876 > yarn.log but I can't find the session values in my yarn.log file. How would it be displayed? Am I doing something wrong? I'm using spark 2.3.4.
... View more
08-15-2022
01:30 PM
I have a Spark sql query that works when I execute from inside a Jupyter Notebook that has a a PySpark kernel but fails when I execute it submitting to a Livy session. Usually there's no difference when I execute my queries both ways. I tried to get the spark session parameters with the command below and to guarantee that they are both the same: spark.sparkContext.getConf().getAll() I'm using spark 2.3. How can I debug this problem? I know that the query works using spark, but I can't make it work submitting with Livy. Here is the query: INSERT INTO sbx_xxxxx.dados_auxiliares_mediana_preco select p.ano, p.cod_cfi, p.preco, p.repeticoes, sum(p2.repeticoes) as qtd_antes from sbx_operacoes_digitais.dados_auxiliares_moda_preco p left join sbx_operacoes_digitais.dados_auxiliares_moda_preco p2 on p.ano = p2.ano and p.cod_cfi = p2.cod_cfi and p2.preco <= p.preco group by p.ano, p.cod_cfi, p.preco, p.repeticoes Here are the stack trace of the query returned by Livy: An error occurred while calling o99.sql. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange(coordinator id: 1868588554) hashpartitioning(ano#100, cod_cfi#101, 2001), coordinator[target post-shuffle partition size: 67108864] +- *(1) FileScan orc sbx_operacoes_digitais.dados_auxiliares_moda_preco[ano#100,cod_cfi#101,preco#103,repeticoes#104] Batched: true, Format: ORC, Location: InMemoryFileIndex[hdfs://BNDOOP03/corporativo/sbx_operacoes_digitais/dados_auxiliares_moda_preco_..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ano:int,cod_cfi:int,preco:decimal(14,2),repeticoes:decimal(14,2)> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doExecute(SortMergeJoinExec.scala:150) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:180) ... 23 more Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 58 more It looks like that the important part is a failed assertion, but It does not gives me any useful info: Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) Any help to teach me how to debut it is greatly appreciated
... View more
Labels:
- Labels:
-
Apache Spark
08-15-2022
10:19 AM
Te problem improved upgrading the driver version. It still persists, but is rarer.
... View more
07-07-2021
04:35 PM
Sorry, I don't know how to add files to this thread. The following types were refused: log, txt, and zip. I'd send it by email if someone asks.
... View more
07-07-2021
04:25 PM
This is very weird. If I turn the log trace on, it works. If I turn the log at debug level or higher, it fails. Before sending the log files, here is some context. We don't have admin rights to our Windows Machines. The odbc drivers are installed as Admin by a script from a internal "App Store". After installing it, a bat file is executed to create the odbc data sources. The bat file basically executes this command: %windir%\syswow64\odbcconf configdsn "Cloudera ODBC Driver for Apache Hive" "DSN=Datalake|DESCRIPTION=Driver datalake|HiveServerType=2|ServiceDiscoveryMode=ZooKeeper|ZKNamespace=hiveserver2|Host=zookeeper01:2181,zookeeper02:2181,zookeeper03:2181|Port=10000|Schema=default|AuthMech=1|KrbRealm=S.NET|KrbServiceName=hive|ServicePrincipalCanonicalization=0|KrbHostFQDN=_HOST|GetTablesWithQuery=1|InvalidSessionAutoRecover=0|AutoReconnect=1" After the install, I open the Windows ODBC Admin and try to change the log level in "my datasource" -> Configure button -> Logging Options the option isn't persisted. If I try to configure it again the log level is off. I also can't edit the file "C:\Windows\odbc.ini". The sysadm gave admin permissions to my user. I edited the odbc.ini to add the recommended log config: [Driver] LogLevel=6 LogPath=C:\temp but it didn't saved any log info. So, with admin powers, I went to the ODBC admin, confligured the log level to trace at the "Logging Options" above and it worked. After editing the log level, these lines were automatically added to the ODBC.INI file: [Datalake] Driver32=C:\Program Files\Cloudera ODBC Driver for Apache Hive\lib\ClouderaHiveODBC64.dll These lines were not present before. The admin powers were removed from my user, but now I can change the "Logging Options" and they are persisted. ¯\_(ツ)_/¯ And now the weird behavior happens. If the log level is TRACE it works fine, if it is DEBUG or higher, it fails. Here how I test it: I open dbeaver I expand the Datalake connection so it connects. It opens a tree with a node HIVE below it I expand the HIVE tree and it lists all my databases names. So far, so fine. This always work. Now if I open a specific database, expand it and click to expand the table names, the debug version expands just the first table, the trace version display all the 3 tables. Here are the logs. The one with DEBUG ON and the one with TRACE ON. Each has 2 connections, since Dbeaver opens a separate connection for the metadata. Any help is greatly appreciated. I'll try to attach the log files using another browser in a reply.
... View more
07-06-2021
01:34 PM
Thanks, I'm trying to do it. I believe it is the file C:\Windows\ODBC.INI and that I must add these lines at the end of it: [Driver] LogLevel=6 LogPath=/tmp I don't have write permission to this file, but I'll ask someone to do it.
... View more
07-05-2021
12:52 PM
Weird. If I select a table some of the values come with the S1090 error: same query above return some valid values: But if I wait some minutes and try the query again. It works! Should I try some driver configuration?
... View more
07-05-2021
12:08 PM
I'm connecting to Hive using Cloudera's ODBC driver version 2.6.9 and I'm getting a weird behavior that ruins the user experience. Any help to solve or debug this problem would be greatly appreciated. I'm using DBeaver to connect to it. All my databases just display a few of it's tables. Here is an example. When I open DBeaver, The dabase just shows the table `localizacao`: This database has 3 tables. If I run the command show tables from raw_aneel, DBeaver interface will display: The error message in english would be: org.jkiss.dbeaver.model.exec.DBCException: SQL Error [S1090]: [Microsoft][ODBC Driver Manager] Invalid string or buffer length Here is the complete stack trace displayed in the Error Log. The error message above is in Brazilian Portuguese: org.jkiss.dbeaver.model.exec.DBCException: SQL Error [S1090]: [Microsoft][ODBC Driver Manager] Comprimento inválido de cadeia de caracteres ou de buffer at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCAbstractValueHandler.fetchValueObject(JDBCAbstractValueHandler.java:55) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetDataReceiver.fetchRow(ResultSetDataReceiver.java:125) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:718) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:541) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:441) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:428) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:813) at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:3280) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:118) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:116) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:4624) at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63) Caused by: java.sql.SQLException: [Microsoft][ODBC Driver Manager] Comprimento inválido de cadeia de caracteres ou de buffer at sun.jdbc.odbc.JdbcOdbc.createSQLException(JdbcOdbc.java:6964) at sun.jdbc.odbc.JdbcOdbc.standardError(JdbcOdbc.java:7121) at sun.jdbc.odbc.JdbcOdbc.SQLGetDataString(JdbcOdbc.java:3914) at sun.jdbc.odbc.JdbcOdbcResultSet.getDataString(JdbcOdbcResultSet.java:5697) at sun.jdbc.odbc.JdbcOdbcResultSet.getString(JdbcOdbcResultSet.java:353) at sun.jdbc.odbc.JdbcOdbcResultSet.getObject(JdbcOdbcResultSet.java:1677) at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.getObject(JDBCResultSetImpl.java:627) at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCStringValueHandler.fetchColumnValue(JDBCStringValueHandler.java:52) at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCAbstractValueHandler.fetchValueObject(JDBCAbstractValueHandler.java:49) ... 14 more Before this stack trace there is a message `Can't read column 'tab_name' value`: Any Idea how to debug this?
... View more
Labels:
- Labels:
-
Apache Hive
12-17-2020
03:34 PM
The command show databases list all databases in a Hive instance. The command list databases that I have and don't have access to. When I try to list the tables in a DB that I don't have access to, using the command show tables from forbidden_db, it returns an empty list. Which command would list all databases that I have access to at least one table?
... View more
Labels:
- Labels:
-
Apache Hive