Ticket web page slow after upgrading otrs 5 to 6

Moderator: crythias

Post Reply
giacomos
Znuny newbie
Posts: 10
Joined: 28 Sep 2022, 14:51
Znuny Version: 6.0.38
Real Name: Giacomo

Ticket web page slow after upgrading otrs 5 to 6

Post by giacomos »

I have updated our otrs server from version 5 to 6. Now I notice a
slowdown on loading the web page when I consult the tickets.

After logging in, the web page loads the dashboard quite quickly. I
can see the lists of new tickets and open tickets. Clicking on the menu
items under "Ticket", "Admin", etc. is fast enough.

But clicking on some tickets of these lists is slow and not acceptable, the
opening of the page with the ticket details takes from 4-5 seconds
(best case), up to 9-10 seconds; sometimes it gets stuck and the
loading happens after about a minute (very rare case).
The slow pages I refer to are those with url like
xx.xx.xx.xx/otrs/index.pl?Action=AgentTicketZoom;TicketID=xyxyxyxy

The moment I click to consult a ticket, I notice that the "top" command
on the server machine shows a processor activity spike (which persists
until the page is loaded):

Code: Select all

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                         
   2025 www-data  20   0 1188436 915512  11280 R  98.1   9.0   1:24.91 /opt/otrs/bin/c                                                                                                                                 
The number of tickets in my DB is very large by the way:

- not archived = 11782
- not archived and open = 1.4k
- archived = over a million

I also note that I moved the ticket storage to storageFS.

My system is a virtual machine with Ubuntu 22, 10 GB RAM, and mariadb 10.6.

I have tested the same installation with another (very tiny) database
and opening article details is very fast.

So my question is: does the slowness of opening the ticket details web page
(index.pl?Action=AgentTicketZoom..) depend on the
size of database (i.e., the number of tickets)?
Johannes
Moderator
Posts: 391
Joined: 30 Jan 2008, 02:26
Znuny Version: All of them ^^
Real Name: Hannes
Company: Znuny|OTTERHUB

Re: Ticket web page slow after upgrading otrs 5 to 6

Post by Johannes »

Hi,

the number of tickets does not matter for the performance, when you open a single ticket - ATZoom View.
The number of open Tickets is in the "regular" range. Nothing special.
The number of articles on the other hand is relevant+the article view type the user has selecte - all expanded or single selection.

In regards to your other thread, I suspect that there is a performance / config error.
Articles in the FS is a good choice.

I would start using the regular questions:
- OTRS 5 was used on this exact server with this exact Software version (OS / Database...)?
- What is the MariaDB slow log showing?
- When the load is high, what is your MariaDB (full) process list showing at the time?
- I would use htop top, enabled detailed CPU stats and check for IO wait during the run
- Next check, if enabled, should be the external backends (Customer User backends)

Regards
Johannes
giacomos
Znuny newbie
Posts: 10
Joined: 28 Sep 2022, 14:51
Znuny Version: 6.0.38
Real Name: Giacomo

Re: Ticket web page slow after upgrading otrs 5 to 6

Post by giacomos »

Johannes, thanks a lot for your suggestions!
> I would start using the regular questions:
> - OTRS 5 was used on this exact server with this exact Software version (OS / Database...)?
They're not actually the same server.
Otrs 5 runs on ubuntu 10.04(!), with znuny 6.0.38 on ubuntu 22.04;
although they are on a VM with same amount of memory.
> - What is the MariaDB slow log showing? - When the load is high, what is your MariaDB (full) process list showing at the time?
I didn't enable slow query log, although I grabbed the full process list of mariadb and saved it into a .sql file (you can find it below for reference).
I executed the script in a shell like this `cat a2.sql | mysql -u root -p otrs` and verified that it took no time to execute. To be sure, I also ran every single query in a sql shell and they are indeed instantly fast.

I tried to test the writing speed of the disk. I tested like this:

Code: Select all

dd if=/dev/zero of=./test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.15582 s, 150 MB/s
(for reference, the same command gave ~750MB/s with my dell i7 laptop, with ssd drive)

`lscpu` says that the processor is quad core @2.0GHz
- CPU(s):4 -- Model name: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
> - I would use htop top, enabled detailed CPU stats and check for IO wait during the run
I tried both htop and top and when I click for a ticket-ATZoomView I do not notice a high %wa value (actually it is almost always 0%).
> - Next check, if enabled, should be the external backends (Customer User backends)
Mmh, I do not know if there are any (i guess not).


Can it be the slow disk-writing time the problem? or maybe the slow processor?


**PS**: full mariadb process list .sql file:

Code: Select all

SET NAMES utf8  ;
SELECT id, data_key, data_value, serialized FROM sessions WHERE session_id = 'NvtIEvhPUssxnh8UWkgVfykRnTpIu5tm' ORDER BY id ASC  ;
SELECT id, value_text, value_date, value_int, field_id FROM dynamic_field_value WHERE object_id = '1183083' ORDER BY id  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st WHERE 1=1 AND st.ticket_state_id IN ( 1, 12, 13, 18 ) AND ( st.responsible_user_id IN (1) ) AND archive_flag = 0 LIMIT 10000  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st INNER JOIN ticket_flag tf1 ON st.id = tf1.ticket_id WHERE 1=1 AND st.ticket_state_id IN ( 1, 12, 13, 18 ) AND ( st.responsible_user_id IN (1) ) AND tf1.ticket_key = 'Seen' AND tf1.ticket_value = '1' AND tf1.create_by = 1 AND archive_flag = 0 LIMIT 10000  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st WHERE 1=1 AND st.ticket_state_id IN ( 12 ) AND ( st.responsible_user_id IN (1) ) AND st.ticket_state_id IN (12) AND st.until_time <= 1664263904 AND archive_flag = 0 LIMIT 10000  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st WHERE 1=1 AND ( st.ticket_lock_id IN (2, 3) ) AND ( st.user_id IN (1) ) AND archive_flag = 0 LIMIT 10000  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st INNER JOIN ticket_flag tf1 ON st.id = tf1.ticket_id WHERE 1=1 AND ( st.ticket_lock_id IN (2, 3) ) AND ( st.user_id IN (1) ) AND tf1.ticket_key = 'Seen' AND tf1.ticket_value = '1' AND tf1.create_by = 1 AND archive_flag = 0 LIMIT 10000  ;
SELECT COUNT(DISTINCT(st.id)) FROM ticket st WHERE 1=1 AND st.ticket_state_id IN ( 12 ) AND ( st.ticket_lock_id IN (2, 3) ) AND ( st.user_id IN (1) ) AND st.ticket_state_id IN (12) AND st.until_time <= 1664263904 AND archive_flag = 0 LIMIT 10000  ;
SELECT id FROM system_maintenance WHERE start_date <= 1664263964 and stop_date >= 1664263964 AND valid_id IN (1) ORDER BY id  ;
SELECT start_date FROM system_maintenance WHERE start_date > 1664263964 and start_date <= 1664371964 AND valid_id IN (1) ORDER BY id  ;
SELECT article.id, article_flag.article_key, article_flag.article_value FROM article_flag, article WHERE article.id = article_flag.article_id AND article.ticket_id = '1183083' AND article_flag.create_by = '1' LIMIT 1500  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT id, value_text, value_date, value_int, field_id FROM dynamic_field_value WHERE object_id = '2473697' ORDER BY id  ;
SELECT time_unit FROM time_accounting WHERE ticket_id = '1183083' ;
SET NAMES utf8 ;
SELECT id, data_key, data_value, serialized FROM sessions WHERE session_id = 'NvtIEvhPUssxnh8UWkgVfykRnTpIu5tm' ORDER BY id ASC ;
SELECT DISTINCT(session_id) FROM sessions  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT article_id, message_id, log_message, create_time FROM article_data_mime_send_error WHERE article_id = '2473697'  ;
SELECT article_id, create_time, attempts, due_time FROM mail_queue WHERE article_id = '2473697'  ;
SELECT article_key, article_value FROM article_flag WHERE article_id = '2473697' AND create_by = '1' LIMIT 1500  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT queue_id FROM system_address WHERE valid_id IN ( 1 ) AND value0 = 'm......g......@g.....com' LIMIT 1  ;
SELECT clo.communication_id, clol.communication_log_object_id, clol.object_type ,clol.object_id FROM communication_log_obj_lookup clol JOIN communication_log_object clo ON clol.communication_log_object_id = clo.id WHERE (clol.object_id = '2473697') AND (clol.object_type = 'Article')  ;
SELECT article_key, article_value FROM article_flag WHERE article_id = '2473697' AND create_by = '1' LIMIT 1500  ;
SELECT article_id, message_id, log_message, create_time FROM article_data_mime_send_error WHERE article_id = '2473697'  ;
SELECT article_id, create_time, attempts, due_time FROM mail_queue WHERE article_id = '2473697'  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT sadm.a_from, sadm.a_reply_to, sadm.a_to, sadm.a_cc, sadm.a_bcc, sadm.a_subject, sadm.a_message_id, sadm.a_in_reply_to, sadm.a_references, sadm.a_content_type, sadm.a_body, sadm.incoming_time FROM article_data_mime sadm WHERE sadm.article_id = '2473697' LIMIT 1  ;
SELECT article_id, message_id, log_message, create_time FROM article_data_mime_send_error WHERE article_id = '2473697'  ;
SELECT article_id, create_time, attempts, due_time FROM mail_queue WHERE article_id = '2473697'  ;
INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ('NvtIEvhPUssxnh8UWkgVfykRnTpIu5tm','UserLastRequest','1664263963','0')  ;
DELETE FROM sessions WHERE session_id = 'NvtIEvhPUssxnh8UWkgVfykRnTpIu5tm' AND data_key = 'UserLastRequest' AND id <= '2286' ;
Johannes
Moderator
Posts: 391
Joined: 30 Jan 2008, 02:26
Znuny Version: All of them ^^
Real Name: Hannes
Company: Znuny|OTTERHUB

Re: Ticket web page slow after upgrading otrs 5 to 6

Post by Johannes »

Hi,

ok. Without access a bit jiggling in the dark.
If IO Wait in HTOP goes up and beyond, it indicates that your FS is slow when it comes to parsing multiple smaller files.
The "large" file test is nice to know, but reading / writing thousands of small files really shows the weakness. And my guess is, that this is the case. Often resources in VM environments tend to assign resources based on load. If you create on large files, resources are shifted until it is done. On smaller files, even for reading, the time they are actually read is not enough for a VM host to actually assign the needed resources. Especially on the "modern" ones who base their assignment on historical information of the guest. If the guest is new -> no infos -> slower then existing ones.
It would be interesting to see an update of the existing instance 5 to 6 and then compare the results. I would bet they are better.

We have seen it before, but I'm no expert here. I only can help to identify the symptoms.The code in this area is mainly untouched.


Additional question:
Which FS type / format do you use on the old and the new one?

Regards
Johannes

Another small side note: we tend to use postgresql on larger instances. The overall performance (without tweaking) is better in our opinion. Especially compared to current maria db releases. 10.15+
giacomos
Znuny newbie
Posts: 10
Joined: 28 Sep 2022, 14:51
Znuny Version: 6.0.38
Real Name: Giacomo

Re: Ticket web page slow after upgrading otrs 5 to 6

Post by giacomos »

Thanks again.

Based on what you tell me then I understand that, when clicking on
article zoom view, many (small) files (maybe in
/opt/otrs/var/..cache..?) are read/written.

I tested the performance of the DB, and I deem it quite fast, so it
may not affect the problem very much. Nonetheless, I will remember
that postgres may be faster in such cases (unfortunately, I see it
difficult to try it in this installation).

I also suspect the problem is lying in poor VM performance,
but top's `%wa` value does not go above 0% when clicking the link, so
maybe it is difficult to address the problem to iowait.

FS type is ext4 for both new (ubuntu 22) and old (10.04) servers.

**PS**: btw, I would like to test the system with my laptop (not a VM
and quite a modern hardware. I will see if I can get it installed
locally and try it.

Could it be simply a problem of slow cpu? I see that clicking a link
only one cpu is used (so having quad core is not helping here).
Johannes
Moderator
Posts: 391
Joined: 30 Jan 2008, 02:26
Znuny Version: All of them ^^
Real Name: Hannes
Company: Znuny|OTTERHUB

Re: Ticket web page slow after upgrading otrs 5 to 6

Post by Johannes »

Hi,

you wrote earlier that you switched to ArticleStorageFS.
This results in "Article Data" = (Attachments, Plain Mails, HTML Body) are stored in a hierarchal folder structure: var/article/YEAR/MONTH/DAY/ARTICLEID
so lets say you have a ticket with 10 articles and 3 Attachments, all HTML.
For every article it opens the FS path and (at worst) performs
- 10 x HTML body
- 3 Attachments+ maybe some signature images (which also count as attachment)
- 10 x Plain Content

From you db dump I can see the session information are stored in the FS, so this is not accountable.
But the cache maybe. The cache consists of multiple folders for the object and the binary representation of what is stored.

The high CPU is only a symptom, not the reason. CPU is high, because the thread is not able to get all the information in time (usually).
The same happens when you have a customer backend, lets say LDAP, in use. The backend is not reachable for whatever reason and your timeout is at 10 seconds.
The webserver process will wait, tries to gather all the infos, which he will not get and until the timeout is reached CPU usage|time will be higher than usual.

It can be a combination of users, parallel access and many other things. Hard to tell without access and some analysis.

Regarding the FS. I would suggest to switch to XFS, it can deal with smaller files much better.

Have a good weekend

EDIT:
If you want to try you could mount a RAMDISK for the cache. It does not get any faster than this.
Post Reply