Multiline regex gotcha

Continuing with the regex theme this week, I would like to cover a corner case with regular expression matching to be aware of. The example has to do with a single event that contains multiple lines or new line characters and the use of the .* regex.

Example

2014-01-30 14:38:12,032 [Thread-57] [iaas-proxy] ERROR com.vmware.vcac.iaas.service.impl.CatalogRequestServiceImpl.failed:384 – Exception during request callback with id a739e6e6-d9fd-411d-a6a7-dd3b4d42a276 for item 8dfc5e04-d6a1-4dbc-91c7-b4418fc0c632. Error Message: [Error code: 42100 ] – [Error Msg: Infrastructure service provider error]
dynamicops.api.client.ApiClientException
     at dynamicops.api.client.ClientResponseHandler.handleResponse(BaseHttpClient.java:291)
     at dynamicops.api.client.BaseHttpClient$1.handleResponse(BaseHttpClient.java:151)
     at org.apache.http.client.fluent.Response.handleResponse(Response.java:85)
     at org.apache.http.client.fluent.Async$ExecRunnable.run(Async.java:80)
     at java.lang.Thread.run(Unknown Source)

To highlight the gotcha, I will use the above event in Log Insight an attempt to extract a field.

Extracting a field using one line

Let’s say I want to extract the exception reason from the end of the event. To do so, I may define the extracted field as follows in Log Insight:

li-ml-1

Extracting a field using multilines

While the above definition works, it may not match what I want. For example, what if I want to know the exception reason only for ApiClientException? To do this, I could add more context to my extracted field as follows:

li-ml-2

Understanding regex

In the second example, you can see that none of the event is highlighted. Looking at the regex you may wonder why. The only change was the addition of:

ApiClientException.*

to the pre-context, which should match ApiClientException followed by anything, right? Well, anything except for new line characters. Turns out the period (.) means any character except for new line characters. Given that the example log message is a multiline message this is a problem.

To resolve the issue, replace the period with:

[\d\D]

which means any digit or non-digit (including new lines). With the above modification, the extracted field works as expected:

li-ml-3

While this is a corner case as multiline messages are less common and not well handled in syslog, it can cause a lot of frustration at first. I hope this helps!

© 2014, Steve Flanders. All rights reserved.

2 thoughts on “Multiline regex gotcha

  1. Hakan says:

    Hi Steve,

    How can I extract Account name (test.user) from following log ?

    An attempt was made to access an object.

    Subject:
    Security ID: S-1-5-21-2584360908-2122736659-2498258442-75107
    Account Name: test.user
    Account Domain: DEMO
    Logon ID: 0x63736D
    Object:
    Object Server: Security
    Object Type: File
    Object Name: D:\asd.txt
    Handle ID: 0x2f18
    Resource Attributes:
    Process Information:
    Process ID: 0x4
    Process Name:

    Access Request Information:
    Accesses: DELETE

    Access Mask: 0x10000

  2. Hakan says:

    I found a predefined field

    ms_win_security_audit_subject_account_name (Microsoft – Windows)

    it resolved my problem

Leave a Reply