Member since
05-08-2019
18
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1409 | 02-12-2020 02:25 AM |
02-12-2020
02:25 AM
Figured this out internally . Turns out the spark optimization routine can be affected by the configuration setting spark.sql.codegen.Maxfields which can have implications in how spark will optimize the read from 'fat' tables . In my case the setting was set low which means DAG stages of the read from the right side of the join (the "fat" table) were performed without being assigned to a wholestage codegen . Important to note that the read of the hive data in either case returned the same results just with a different optimization applied to the physical plan .
... View more
02-10-2020
01:23 PM
Hi community I am trying to debug a simple query in spark SQL that is returning incorrect data. In this instance the query is a simple join between two hive tables .. The issue seems tied to the fact that a the physical plan that spark has generated (with catalyst optimization) looks to be corrupted where some of the steps in the physical plan have not been assigned an evaluation order id and thus all evaluation on the right side of the join is not completed in the spark query This error is on a HDP3.1.4 cluster running >>> sc.version
u'2.3.2.3.1.4.0-315' here is the example query .. from pyspark_llap import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
filter_1 = hive.executeQuery('select * from 03_score where scores = 5 or scores = 6')
filter_2 = hive.executeQuery('select * from 03_score where scores = 8')
joined_df = filter_1.alias('o').join(filter_2.alias('po'), filter_1.encntr_id == filter_2.encntr_id, how='inner')
joined_df.count() ### shows incorrect value ###
joined_df.explain(True) and here is the output of plan evaluation == Physical Plan ==
SortMergeJoin [encntr_id#0], [encntr_id#12], Inner
:- *(2) Sort [encntr_id#0 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(encntr_id#0, 200)
: +- *(1) Filter isnotnull(encntr_id#0)
: +- *(1) DataSourceV2Scan [encntr_id#0, scores_datetime#1, scores#2], com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader@a6df563
+- Sort [encntr_id#12 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(encntr_id#12, 200)
+- Filter isnotnull(encntr_id#12)
+- DataSourceV2Scan [encntr_id#12, dateofbirth#13, postcode#14, event_desc#15, event_performed_dt_tm#16], com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader@60dd22d9 Note that all datasourceV2scan , filter exchange and sort on the right side of the join have not been assigned an order id and therefore never computed. Can anyone shed some light on this issue for me .. Why would the physical plan which looks correct not be assigned an evaluation order id ?
... View more
Labels:
12-20-2019
01:15 PM
I have the following issue with this setup . .I define Livy service on a Knox topology with authentication provider enabled . When I request the Livy session over Knox url Knox requests the Livy session with doAs = myuser . So far so good. .. Livy sessions is started with owner=Knox and proxyuser =myuser.. Problem is when we attempt to post to Livy statements API over the Knox url. If we use the Knox url for posting to the running Livy session Knox will add the doAs=myuser . But now we get a forbidden response . Basically because the Livy session is owned by Knox we cannot post statement into the session over the Knox url with doAs=myuser . in my setup at least only the Knox user may post a statement to a Livy session owned by Knox .
... View more
08-07-2019
05:42 PM
With the Advent of heterogeneous storage for hdfs can we now look at Nas in a new light .. Potentially we could lable Nas mounts on a data nodes as archive storage and have hdfs move data in there when it becomes cold I would like to hear opinions on this
... View more
05-10-2019
01:13 AM
HI Li. Appreciate the response . I whitelisted the java.sun.com and oracle.com fqdn here and have navigator coming online thank you very much
... View more
05-09-2019
12:49 PM
Hi li You are correct .. the machine is isolated from internet.. Appreciate the response and help.. Unfortunately I can not access the knowledge article.. Is there a public resource I can read it from Regards P
... View more
05-09-2019
11:32 AM
Hi li You are correct .. the machine is isolated from internet.. Appreciate the response and help.. Unfortunately I can not access the knowledge article.. Is there a public resource I can read it from Regards P
... View more
05-09-2019
08:09 AM
Hi All.
We have an issue with cloudera navigator UI on 6.2 manual installation .. Background..
Installed the navigator audit and metadat server roles on a cluster backed by external postgresql database
CDH is using auto_TLS
Navigator metadata and audit roles are status green on the cloudera manager UI
However when i browse to the navigator UI we get a status 503 service unavailable message.
we have confirmed the navigator role is up and running
[root@cmhdgateway01 hue]# lsof -i:7187 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 12766 cloudera-scm 532u IPv4 814278 0t0 TCP *:7187 (LISTEN) python2.7 17534 hue 31u IPv4 955050 0t0 TCP cmhdgateway01.cso.ie:50144->cmhdgateway01.cso.ie:7187 (CLOSE_WAIT) [root@cmhdgateway01 hue]#
The following is captured on the startuplogs of navigator
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.vm.specification.version=1.8
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.arch.data.model=64
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.home=/usr/java/jdk1.8.0_181-cloudera/jre
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.java.command=com.cloudera.nav.server.NavServer /run/cloudera-scm-agent/process/420-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator.properties /run/cloudera-scm-agent/process/420-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator-cm-auth.properties /run/cloudera-scm-agent/process/420-cloudera-mgmt-NAVIGATORMETASERVER/db.navms.properties /run/cloudera-scm-agent/process/420-cloudera-mgmt-NAVIGATORMETASERVER/cm-ext-accounts.properties
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.specification.vendor=Oracle Corporation
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: user.language=en
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: awt.toolkit=sun.awt.X11.XToolkit
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.vm.info=mixed mode
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.version=1.8.0_181
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.ext.dirs=/usr/java/jdk1.8.0_181-cloudera/jre/lib/ext:/usr/java/packages/lib/ext
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.boot.class.path=/usr/java/jdk1.8.0_181-cloudera/jre/lib/resources.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/rt.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/jsse.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/jce.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/charsets.jar:/usr/java/jdk1.8.0_181-cloudera/jre/lib/jfr.jar:/usr/java/jdk1.8.0_181-cloudera/jre/classes
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.vendor=Oracle Corporation
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.awt.headless=true
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: file.separator=/
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: java.vendor.url.bug=http://bugreport.sun.com/bugreport/
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: mgmt.log.file=mgmt-cmf-mgmt-NAVIGATORMETASERVER-cmhdgateway01.cso.ie.log.out
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.cpu.endian=little
2019-05-09 12:14:42,970 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.io.unicode.encoding=UnicodeLittle
2019-05-09 12:14:42,971 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: javax.net.ssl.trustStorePassword=PJx5BllHfNHUsaSGN6TP0uKDUqVkM9B5IqwgdQ8MlLA
2019-05-09 12:14:42,971 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: sun.cpu.isalist=
2019-05-09 12:14:42,971 INFO com.cloudera.nav.actions.LogEnvironmentValues [main]: Maximum Memory allocated for JVM: 1.9675293 GB
2019-05-09 12:14:42,971 INFO com.cloudera.nav.server.NavServer [main]: Skipping the database initialization routine com.cloudera.nav.actions.MySQLEngineTypeVerifier as it is disabled.
2019-05-09 12:14:42,971 INFO com.cloudera.nav.server.NavServer [main]: Executing the database initialization routine com.cloudera.nav.actions.CharacterSetEncodingVerifier
2019-05-09 12:14:42,971 INFO com.cloudera.nav.actions.CharacterSetEncodingVerifier [main]: JDBC Driver class org.postgresql.Driver
2019-05-09 12:14:43,196 INFO com.cloudera.nav.actions.CharacterSetEncodingVerifier [main]: Character set is: UTF8
2019-05-09 12:14:43,197 INFO com.cloudera.nav.server.NavServer [main]: Executing the database initialization routine com.cloudera.nav.server.UpgradeSchema
2019-05-09 12:14:43,297 INFO com.cloudera.enterprise.dbutil.DbUtil [main]: Schema version table already exists.
2019-05-09 12:14:43,300 INFO com.cloudera.enterprise.dbutil.DbUtil [main]: DB Schema version 60000.
2019-05-09 12:14:43,300 INFO com.cloudera.enterprise.dbutil.DbUtil [main]: Current database schema version: 60000
2019-05-09 12:14:43,468 INFO com.cloudera.nav.server.NavServer [main]: Skipping the database initialization routine com.cloudera.nav.server.UpgradeOrdinalVerifier as it is disabled.
2019-05-09 12:14:43,469 INFO com.cloudera.nav.server.NavServer [main]: Completed all database initialization routines successfully.
2019-05-09 12:14:43,792 INFO com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource [main]: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> u0778a217d9g7lynla0g|15043a2f, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> u0778a217d9g7lynla0g|15043a2f, idleConnectionTestPeriod -> 300, initialPoolSize -> 2, jdbcUrl -> jdbc:postgresql://cmhdgateway01.cso.ie/navms, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 50, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 5, numHelperThreads -> 3, preferredTestQuery -> null, privilegeSpawnedThreads -> false, properties -> {user=******, password=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
2019-05-09 12:14:43,826 WARN com.mchange.v2.resourcepool.BasicResourcePool [main]: Bad pool size config, start 2 < min 5. Using 5 as start.
2019-05-09 12:14:43,914 INFO com.cloudera.nav.server.NavServer [main]: Enabling SSL
2019-05-09 12:17:00,867 WARN org.eclipse.jetty.webapp.WebAppContext [main]: Failed startup of context o.e.j.w.WebAppContext@180da663{/,[file:///var/lib/cloudera-scm-navigator/temp/jetty-0.0.0.0-7187-nav-core-webapp-6.2.0.war-_-any-3567577415138419929.dir/webinf/, file:///var/lib/cloudera-scm-navigator/temp/jetty-0.0.0.0-7187-nav-core-webapp-6.2.0.war-_-any-3567577415138419929.dir/webapp/],UNAVAILABLE}{/opt/cloudera/cm/cloudera-navigator-server/wars/nav-core-webapp-6.2.0.war}
java.net.ConnectException: Connection timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:647)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1304)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1270)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:264)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1161)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1045)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:959)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
at org.eclipse.jetty.xml.XmlParser.parse(XmlParser.java:255)
at org.eclipse.jetty.webapp.Descriptor.parse(Descriptor.java:55)
at org.eclipse.jetty.webapp.WebDescriptor.parse(WebDescriptor.java:212)
at org.eclipse.jetty.webapp.MetaData.setWebXml(MetaData.java:194)
at org.eclipse.jetty.webapp.WebXmlConfiguration.preConfigure(WebXmlConfiguration.java:60)
at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:506)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:544)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:138)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:168)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:138)
at org.eclipse.jetty.server.Server.start(Server.java:415)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:108)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:382)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at com.cloudera.nav.server.NavServer.run(NavServer.java:425)
at com.cloudera.nav.server.NavServer.main(NavServer.java:123)
2019-05-09 12:17:04,387 WARN org.eclipse.jetty.security.SecurityHandler [main]: ServletContext@o.e.j.w.WebAppContext@710b18a6{/solr,[file:///var/lib/cloudera-scm-navigator/temp/jetty-0.0.0.0-7187-solr-4.10.3.war-_solr-any-1169072435507539588.dir/webinf/, file:///var/lib/cloudera-scm-navigator/temp/jetty-0.0.0.0-7187-solr-4.10.3.war-_solr-any-1169072435507539588.dir/webapp/],STARTING}{/opt/cloudera/cm/cloudera-navigator-server/wars/solr-4.10.3.war} has uncovered http methods for path: /*
2019-05-09 12:17:07,333 WARN org.apache.solr.schema.IndexSchema [coreLoadExecutor-5-thread-1]: Field user is not multivalued and destination for multiple copyFields (2)
2019-05-09 12:17:07,969 WARN org.eclipse.jetty.util.ssl.SslContextFactory.config [main]: No Client EndPointIdentificationAlgorithm configured for SslContextFactory@585ac855[provider=null,keyStore=file:///run/cloudera-scm-agent/process/420-cloudera-mgmt-NAVIGATORMETASERVER/cm-auto-host_keystore.jks,trustStore=null]
2019-05-09 12:17:10,241 INFO com.cloudera.nav.server.NavServerUtil [main]: No nav_elements entities exist with id field in solr
2019-05-09 12:17:10,279 INFO com.cloudera.nav.server.NavServerUtil [main]: No nav_relations entities exist with id field in solr
2019-05-09 12:17:10,525 INFO com.cloudera.nav.server.NavServerUtil [main]: Found 0 documents in solr core nav_elements
2019-05-09 12:17:10,534 INFO com.cloudera.nav.server.NavServerUtil [main]: Found 0 documents in solr core nav_relations
2019-05-09 12:17:10,534 INFO com.cloudera.nav.server.SolrSchemaUpgrade [main]: Checking if solr schema upgrade is needed.
2019-05-09 12:17:10,550 INFO com.cloudera.nav.server.SolrSchemaUpgrade [main]: Schema version 2900 already up to date with latest schema 2900.
2019-05-09 12:17:10,559 INFO com.cloudera.nav.server.SolrSchemaUpgrade [main]: Current database upgrade ordinal is 28, latest upgrade step has ordinal 28
2019-05-09 12:17:10,733 INFO com.cloudera.nav.server.NavServer [main]: Cleaning up maintenance history of any previously running jobs.
2019-05-09 12:17:10,752 INFO com.cloudera.nav.server.NavServer [main]: Navigator Metadata Server listening on https://3.1.2.116:7187
... View more
Labels:
- Labels:
-
Cloudera Manager
-
Cloudera Navigator