Problem parsing complex HTML attachments

Moderator: crythias

Locked
mfaux
Znuny newbie
Posts: 3
Joined: 17 Nov 2011, 02:14
Znuny Version: 3.0.11
Company: SecureSolutions GmbH

Problem parsing complex HTML attachments

Post by mfaux »

Hello,

we have noticed, our OTRS (3.0.11) has troubles parsing complex HTML attachments. Depending on the complexity of the attachment, the parsing takes up to several minutes (!). We have recently received an email with several HTML tables, span-, and pre-tags (not nested, but after each other) which lasts about 5 minutes to be displayed by OTRS.

I've identified Kernel::Output::HTML::Layout::RichTextDocumentServe() as the subroutine which produces the huge delay (called by Kernel::Modules::AgentTicketAttachment). The following part causes the major delay (about 90%?):

Code: Select all

# safety check
if ( !$Param{LoadInlineContent} ) {
    $Param{Data}->{Content} = $Self->RichTextDocumentSafetyCheck(
        String => $Param{Data}->{Content},
    );
}
Is this problem a known problem? Is there anything I can do to speed up this check or to prevent the check from running, without exposing my OTRS to XSS or other risks?

Thank's.

Kind regards,
Manuel
Last edited by mfaux on 17 Nov 2011, 11:46, edited 1 time in total.
mfaux
Znuny newbie
Posts: 3
Joined: 17 Nov 2011, 02:14
Znuny Version: 3.0.11
Company: SecureSolutions GmbH

Re: Problem parsing complex HTML attachments

Post by mfaux »

Okay, seems I've resolved the problem. I think there is a performance issue in the OTRS code in Kernel::System::HTMLUtils::Safety(). I have changed the following:

Original:

Code: Select all

# remove style/javascript parts
if ( $Param{NoJavaScript} ) {
    $Safety{Replace} ||= ${$String} =~ s{
        <style.+?javascript(.+?|)>(.*)</style>
    }
    {}sgxim;
}
My changed code:

Code: Select all

# remove javascript parts
if ( $Param{NoJavaScript} ) {
    $Safety{Replace} ||= ${$String} =~ s{
        <javascript.+?>(.*)</style>
    }
    {}sgxim;
}

# remove style parts
if ( $Param{NoJavaScript} ) {
    $Safety{Replace} ||= ${$String} =~ s{
        <style.+?>(.*)</style>
    }
    {}sgxim;
}
Now the same mail renders in a few seconds instead of a few minutes. My question is: Are the following RegEx's equivalent?

Code: Select all

<style.+?javascript(.+?|)>(.*)</style>

<javascript.+?>(.*)</style>
<style.+?>(.*)</style>
The "|" operator means "or" in the braces, but I am not sure if it is used correctly in the first (original) RegEx, as nothing follows it. Can some RegEx guru help with that? ;)

Maybe I should open a bug in the bug tracker.

Kind regards,
Manuel
jojo
Znuny guru
Posts: 15020
Joined: 26 Jan 2007, 14:50
Znuny Version: Git Master
Contact:

Re: Problem parsing complex HTML attachments

Post by jojo »

Please create a bug report for this.

Thanks
"Production": OTRS™ 8, OTRS™ 7, STORM powered by OTRS
"Testing": ((OTRS Community Edition)) and git Master

Never change Defaults.pm! :: Blog
Professional Services:: http://www.otrs.com :: enjoy@otrs.com
mfaux
Znuny newbie
Posts: 3
Joined: 17 Nov 2011, 02:14
Znuny Version: 3.0.11
Company: SecureSolutions GmbH

Re: Problem parsing complex HTML attachments

Post by mfaux »

jojo
Znuny guru
Posts: 15020
Joined: 26 Jan 2007, 14:50
Znuny Version: Git Master
Contact:

Re: Problem parsing complex HTML attachments

Post by jojo »

can you also post your fix?
"Production": OTRS™ 8, OTRS™ 7, STORM powered by OTRS
"Testing": ((OTRS Community Edition)) and git Master

Never change Defaults.pm! :: Blog
Professional Services:: http://www.otrs.com :: enjoy@otrs.com
Locked