Multiline regex gotcha

Continuing with the regex theme this week, I would like to cover a corner case with regular expression matching to be aware of. The example has to do with a single event that contains multiple lines or new line characters and the use of the .* regex.

Example

2014-01-30 14:38:12,032 [Thread-57] [iaas-proxy] ERROR com.vmware.vcac.iaas.service.impl.CatalogRequestServiceImpl.failed:384 – Exception during request callback with id a739e6e6-d9fd-411d-a6a7-dd3b4d42a276 for item 8dfc5e04-d6a1-4dbc-91c7-b4418fc0c632. Error Message: [Error code: 42100 ] – [Error Msg: Infrastructure service provider error]
dynamicops.api.client.ApiClientException
at dynamicops.api.client.ClientResponseHandler.handleResponse(BaseHttpClient.java:291)
at dynamicops.api.client.BaseHttpClient$1.handleResponse(BaseHttpClient.java:151)
at org.apache.http.client.fluent.Response.handleResponse(Response.java:85)
at org.apache.http.client.fluent.Async$ExecRunnable.run(Async.java:80)
at java.lang.Thread.run(Unknown Source)

To highlight the gotcha, I will use the above event in Log Insight in an attempt to extract a field.

Extracting a field using one line

Let’s say I want to extract the exception reason from the end of the event. To do so, I may define the extracted field as follows in Log Insight:

Extracting a field using multilines

While the above definition works, it may not match what I want. For example, what if I want to know the exception reason only for ApiClientException? To do this, I could add more context to my extracted field as follows:
li-ml-2

Understanding regex

In the second example, you can see that none of the event is highlighted. Looking at the regex you may wonder why. The only change was the addition of:

ApiClientException.*

to the pre-context, which should match ApiClientException followed by anything, right? Well, anything except for new line characters. Turns out the period (.) means any character except for new line characters. Given that the example log message is a multiline message this is a problem.
To resolve the issue, replace the period with:

[\d\D]

which means any digit or non-digit (including new lines). With the above modification, the extracted field works as expected:

While this is a corner case as multiline messages are less common and not well handled in syslog, it can cause a lot of frustration at first. I hope this helps!

© 2014 – 2021, Steve Flanders. All rights reserved.

2 comments on “Multiline regex gotcha

Hakan says:

Hi Steve,
How can I extract Account name (test.user) from following log ?
An attempt was made to access an object.
Subject:
Security ID: S-1-5-21-2584360908-2122736659-2498258442-75107
Account Name: test.user
Account Domain: DEMO
Logon ID: 0x63736D
Object:
Object Server: Security
Object Type: File
Object Name: D:\asd.txt
Handle ID: 0x2f18
Resource Attributes:
Process Information:
Process ID: 0x4
Process Name:
Access Request Information:
Accesses: DELETE
Access Mask: 0x10000

Hakan says:

I found a predefined field
ms_win_security_audit_subject_account_name (Microsoft – Windows)
it resolved my problem

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top