About falbani

falbani · ‎07-10-2018

@Michael Bronson This usually happens for long running applications such as streaming applications which are very verbose. I suggest you check the application id and see who lunched the application and reach out to them and asked them to : 1. Reduce the amount of logging to stdout for this application 2. Is also possible to configure rolling logs on yarn - so they have this option if they need to keep verbose but at least you could restrict the size of the logs and how many you like to keep HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-10-2018

@kanna k So based on the above logs you added I see 2018-07-10 09:55:07,592 ERROR hadoop.gateway (DefaultTopologyService.java:loadTopologies(252)) - Failed to load topology /usr/hdp/2.6.1.0-129/knox/bin/../conf/topologies/sample.xml: org.xml.sax.SAXParseException; lineNumber: 41; columnNumber: 76; The reference to entity "ServiceAccounts" must end with the ';' delimiter. This error points to a sample.xml topology file which has errors. Perhaps you can move it out from topologies directory or fix the problem in the before mentioned line. As far as memory I already provided the link to calculate the memory for Knox in previous post. But as I mentioned, unless you are seeing OOM exceptions I don't think you are having memory problems. I would suggest you check the logs and see what is causing the knox server to go down. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-09-2018

@arjun more check if configuration under /etc/hive2/conf is exact same. Also perhaps an environment variable on server2 is set at hive user level? Have you tried starting from command line to check if you still have same debug level on?

falbani · ‎07-09-2018

@mehul godhaniya the error indicates spark is not able to write __spark_conf__.zip to hdfs. Have you checked your datanodes are up and running and that you can access hdfs correctly from your machine?

falbani · ‎07-07-2018

@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run lsof -P -p <pid> | grep lz4 this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid HTH

falbani · ‎07-07-2018

@mehul godhaniya You need to copy the configuration files hdfs-site.xml, core-site.xml, yarn-site.xml, mapred-site.xml from azure vms to your machine and place them in the eclipse project resource directory so that it will be added to classpath. From classpath those files will be automatically read as configuration for the application. Once that is done you can simplify your code, especially for SparkConf creation and avoid any missing configuration: val conf = new SparkConf() .setAppName("SparkApp").setMaster("yarn-client") val sc = new SparkContext(conf) val file = sc.textFile("/user/hdfs/file.txt") val words = file.flatMap { line => line.split(" ") } val wordsmap = words.map { word => (word,1) } val wordcount = wordsmap.reduceByKey((x,y)=> x+y) wordcount.collect.foreach(println) HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-06-2018

@Guru Prateek Pinnadhari Please review the code for format() function in next link: https://github.com/apache/knox/blob/master/gateway-util-common/src/main/java/org/apache/knox/gateway/audit/log4j/layout/AuditLayout.java @Override public String format( LoggingEvent event ) { sb.setLength( 0 ); dateFormat( sb, event ); CorrelationContext cc = (CorrelationContext)event.getMDC( Log4jCorrelationService.MDC_CORRELATION_CONTEXT_KEY ); AuditContext ac = (AuditContext)event.getMDC( Log4jAuditService.MDC_AUDIT_CONTEXT_KEY ); appendParameter( cc == null ? null : cc.getRootRequestId() ); appendParameter( cc == null ? null : cc.getParentRequestId() ); appendParameter( cc == null ? null : cc.getRequestId() ); appendParameter( event.getLoggerName() ); appendParameter( ac == null ? null : ac.getRemoteIp() ); appendParameter( ac == null ? null : ac.getTargetServiceName() ); appendParameter( ac == null ? null : ac.getUsername() ); appendParameter( ac == null ? null : ac.getProxyUsername() ); appendParameter( ac == null ? null : ac.getSystemUsername() ); appendParameter( (String)event.getMDC( AuditConstants.MDC_ACTION_KEY ) ); appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_TYPE_KEY ) ); appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_NAME_KEY ) ); appendParameter( (String)event.getMDC( AuditConstants.MDC_OUTCOME_KEY ) ); String message = event.getRenderedMessage(); sb.append( message == null ? "" : message ).append( LINE_SEP ); return sb.toString(); } AFAIK the above class function is used to generate the audit line you like to have more information on. Hopefully this helps you if you can read java 🙂 Else here are the answers inline: 1. As seen, there is an IP address in the log that doesn't seem to have an equivalent/document field name. What is this address supposed to be? > This is the remote ip 2. Also, is it intentional to squish "EVENT_PUBLISHING_TIME" and "ROOT_REQUEST_ID" into one column? I understand ROOT_REQUEST_ID is a reserved/empty field but not sure why it is merged with EVENT_PUBLISHING_TIME in the actual log entry. Looks like a bug. > No, the event time is not followed by a | this is the reason why it seems to be squish. 3. The outcome field is misleading. Does it say "succeeded" because a response was sent back to the client? If so, then while technically correct, it does not accurately reflect the status of the operation as it actually failed as seen by the 401 / unauthorized response status. > Correct, says succeeded because it got response and forwarded this back to the client. I dont think knox will try to interpret the actual response and moreover the 401 code is not always a fail (for example with kerberos is expected to get 401 previous kerberos auth) 4. Why could the USER_NAME be missing in the audit entry? > As you can see this username is taken from auditcontext so for some reason in the context of this audit the username was empty for some reason. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-06-2018

@arjun more is the hive-log4j2.properties in both servers the same? Also by any chance is Atlas/Hbase service also installed on server2 - or what other services are collocated in server2 with hive?

falbani · ‎07-05-2018

Don't run it as root, that should not be required. Are you using Ambari to manage the cluster? If yes make sure you do all this changes thru ambari ui. HTH

falbani · ‎07-05-2018

Yes, while configuring the cluster first time that service will be automatically selected for you. But as Dominika mentioned below you can proceed without entering the values.

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: yarn logs + stdout huge size

Re: Knox Gateway down

Re: Hiverserver2 logs set with INFO mode but still...

Re: How to connect my hadoop spark clustor on clou...

Re: lastest HDP 2.6.5.0-292 DataFrame show() throw...

Re: How to connect my hadoop spark clustor on clou...

Re: Interpreting knox's gateway-audit.log

Re: Hiverserver2 logs set with INFO mode but still...

Re: Hive server2 throws java.lang.NoClassDefFoundE...

Re: Do I need to purchase support in order to crea...