Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Interpreting knox's gateway-audit.log

Solved Go to solution

Interpreting knox's gateway-audit.log

Cloudera Employee

Trying to interpret the contents of knox's gateway-audit.log, per https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/audit_log_files.html

However, there are some gaps and either the doc or the code/logger seems to be lacking.

Sample audit entry:
    18/07/06 10:55:56 ||a33a162d-2343-4943-8b7c-f899c230f5a1|audit|10.12.103.228|WEBHDFS||||access|uri|/gateway/XYZ/webhdfs/v1/tmp?op=LISTSTATUS|success|Response status: 401

Same audit entry with fields mapped, per doc:
    EVENT_PUBLISHING_TIME   |18/07/06 10:55:56 
**  ROOT_REQUEST_ID        
    PARENT_REQUEST_ID       |
    REQUEST_ID              |a33a162d-2343-4943-8b7c-f899c230f5a1
    LOGGER_NAME             |audit
**  ???                     |10.12.103.228
    TARGET_SERVICE_NAME     |WEBHDFS
>>  USER_NAME               | 
    PROXY_USER_NAME         |
    SYSTEM_USER_NAME        |
    ACTION                  |access
    RESOURCE_TYPE           |uri
    RESOURCE_NAME           |/gateway/XYZ/webhdfs/v1/tmp?op=LISTSTATUS
    OUTCOME                 |success
    LOGGING_MESSAGE         |Response status: 401

1. As seen, there is an IP address in the log that doesnt seem to have an equivalent/document field name. What is this address supposed to be?

2. Also, is it intentional to squish "EVENT_PUBLISHING_TIME" and "ROOT_REQUEST_ID" into one column? I understand ROOT_REQUEST_ID is a reserved/empty field but not sure why it is merged with EVENT_PUBLISHING_TIME in the actual log entry. Looks like a bug.

3. The outcome field is misleading. Does it say "succeeded" because a response was sent back to the client? If so, then while technically correct, it does not accurately reflect the status of the operation as it actually failed as seen by the 401 / unauthorized response status.

4. Why could the USER_NAME be missing in the audit entry? I see that when a curl request is directly targetted to a Knox server, the entry shows a tagged username and the response status is 200 / OK. But when the same request is targetted to a LB fronting the Knox server, the USER_NAME is unpopulated and the request fails.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Interpreting knox's gateway-audit.log

@Guru Prateek Pinnadhari

Please review the code for format() function in next link:

https://github.com/apache/knox/blob/master/gateway-util-common/src/main/java/org/apache/knox/gateway...

@Override
  public String format( LoggingEvent event ) {
    sb.setLength( 0 );
    dateFormat( sb, event );
    CorrelationContext cc = (CorrelationContext)event.getMDC( Log4jCorrelationService.MDC_CORRELATION_CONTEXT_KEY );
    AuditContext ac = (AuditContext)event.getMDC( Log4jAuditService.MDC_AUDIT_CONTEXT_KEY );
    appendParameter( cc == null ? null : cc.getRootRequestId() );
    appendParameter( cc == null ? null : cc.getParentRequestId() );
    appendParameter( cc == null ? null : cc.getRequestId() );
    appendParameter( event.getLoggerName() );
    appendParameter( ac == null ? null : ac.getRemoteIp() );
    appendParameter( ac == null ? null : ac.getTargetServiceName() );
    appendParameter( ac == null ? null : ac.getUsername() );
    appendParameter( ac == null ? null : ac.getProxyUsername() );
    appendParameter( ac == null ? null : ac.getSystemUsername() );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_ACTION_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_TYPE_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_NAME_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_OUTCOME_KEY ) );
    String message = event.getRenderedMessage();
    sb.append( message == null ? "" : message ).append( LINE_SEP );
    return sb.toString();
  }

AFAIK the above class function is used to generate the audit line you like to have more information on. Hopefully this helps you if you can read java :)

Else here are the answers inline:

1. As seen, there is an IP address in the log that doesn't seem to have an equivalent/document field name. What is this address supposed to be?

> This is the remote ip

2. Also, is it intentional to squish "EVENT_PUBLISHING_TIME" and "ROOT_REQUEST_ID" into one column? I understand ROOT_REQUEST_ID is a reserved/empty field but not sure why it is merged with EVENT_PUBLISHING_TIME in the actual log entry. Looks like a bug.

> No, the event time is not followed by a | this is the reason why it seems to be squish.

3. The outcome field is misleading. Does it say "succeeded" because a response was sent back to the client? If so, then while technically correct, it does not accurately reflect the status of the operation as it actually failed as seen by the 401 / unauthorized response status.

> Correct, says succeeded because it got response and forwarded this back to the client. I dont think knox will try to interpret the actual response and moreover the 401 code is not always a fail (for example with kerberos is expected to get 401 previous kerberos auth)

4. Why could the USER_NAME be missing in the audit entry?

> As you can see this username is taken from auditcontext so for some reason in the context of this audit the username was empty for some reason.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

2 REPLIES 2

Re: Interpreting knox's gateway-audit.log

@Guru Prateek Pinnadhari

Please review the code for format() function in next link:

https://github.com/apache/knox/blob/master/gateway-util-common/src/main/java/org/apache/knox/gateway...

@Override
  public String format( LoggingEvent event ) {
    sb.setLength( 0 );
    dateFormat( sb, event );
    CorrelationContext cc = (CorrelationContext)event.getMDC( Log4jCorrelationService.MDC_CORRELATION_CONTEXT_KEY );
    AuditContext ac = (AuditContext)event.getMDC( Log4jAuditService.MDC_AUDIT_CONTEXT_KEY );
    appendParameter( cc == null ? null : cc.getRootRequestId() );
    appendParameter( cc == null ? null : cc.getParentRequestId() );
    appendParameter( cc == null ? null : cc.getRequestId() );
    appendParameter( event.getLoggerName() );
    appendParameter( ac == null ? null : ac.getRemoteIp() );
    appendParameter( ac == null ? null : ac.getTargetServiceName() );
    appendParameter( ac == null ? null : ac.getUsername() );
    appendParameter( ac == null ? null : ac.getProxyUsername() );
    appendParameter( ac == null ? null : ac.getSystemUsername() );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_ACTION_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_TYPE_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_RESOURCE_NAME_KEY ) );
    appendParameter( (String)event.getMDC( AuditConstants.MDC_OUTCOME_KEY ) );
    String message = event.getRenderedMessage();
    sb.append( message == null ? "" : message ).append( LINE_SEP );
    return sb.toString();
  }

AFAIK the above class function is used to generate the audit line you like to have more information on. Hopefully this helps you if you can read java :)

Else here are the answers inline:

1. As seen, there is an IP address in the log that doesn't seem to have an equivalent/document field name. What is this address supposed to be?

> This is the remote ip

2. Also, is it intentional to squish "EVENT_PUBLISHING_TIME" and "ROOT_REQUEST_ID" into one column? I understand ROOT_REQUEST_ID is a reserved/empty field but not sure why it is merged with EVENT_PUBLISHING_TIME in the actual log entry. Looks like a bug.

> No, the event time is not followed by a | this is the reason why it seems to be squish.

3. The outcome field is misleading. Does it say "succeeded" because a response was sent back to the client? If so, then while technically correct, it does not accurately reflect the status of the operation as it actually failed as seen by the 401 / unauthorized response status.

> Correct, says succeeded because it got response and forwarded this back to the client. I dont think knox will try to interpret the actual response and moreover the 401 code is not always a fail (for example with kerberos is expected to get 401 previous kerberos auth)

4. Why could the USER_NAME be missing in the audit entry?

> As you can see this username is taken from auditcontext so for some reason in the context of this audit the username was empty for some reason.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: Interpreting knox's gateway-audit.log

Cloudera Employee
This is the remote ip

> The address I saw in the log was not the IP of either the knox's curl client or even the LB that is in the path between the client and knox. So not sure how this field is supposed to help in auditing. It could possibly be due to - cluster's network is NAT-ed before hitting LB.

event time is not followed by a |

> So it does seem like a cosmetic bug in AuditLayout code

I dont think knox will try to interpret the actual response

> Thanks for the confirmation Felix. Just a thought, this seems counter-intuitive considering audit's purpose should be to record whether the operation was allowed or denied. Seems like it is lacking some context.

Thanks for your inputs.

Don't have an account?
Coming from Hortonworks? Activate your account here