we're having some issues with migrating our OTRS instance.
Some background: We're currently using OTRS Community v5.0.28 with up to 150 agents and (currently) around 25000 open tickets. The server and DB are hosted on a dedicated Debian 8 machine with SSD storage, 32 GiB RAM and 8 cores. The DB is hosted using MySQL 5.5. There have been no tangible performance issues in the past.
We would now like to migrate everything to inhouse hosting. For that, we using a baseline setup of three servers - one frontend, and two backends in DB replication, with everything on SSD storage as well. The frontend is using 20 GiB RAM and 8 cores, the two backends are spec'd with 32 GiB RAM and 8 cores each. Everything is running an up-to-date Debian 10 with Perl 5.28.1 (frontend) and MariaDB 10.3 (backends). The replication is Master-Master, meaning that OTRS is only talking to the 1st DB server, with the 2nd DB server acting as failover in case of maintenance etc.
We have received a dump of the OTRS application directory (1,8 TiB) and the DB (130 GiB) from our current hoster (no root access being one of the reasons we want to migrate). We were able to set up the application to the point of having a few of our agents verify basic functionality and acceptable responsiveness.
But there lies our issue: it seems that some - though not all - agents are experiencing very high waiting times while navigating ticket queues, with up to 5 minutes until the requests are being executed. During those waits, we see high load on our DB server, MariaDB using 100% CPU with SELECT queries such as these:
Code: Select all
SELECT title, first_name, last_name, login, pw, email, customer_id, phone, fax, mobile, street, zip, city, country, comments, valid_id, create_time, create_by, change_time, change_by, login FROM customer_user WHERE LOWER(login) = LOWER('user-address1@domain.com33735607') LIMIT 1
SELECT title, first_name, last_name FROM customer_user WHERE LOWER(login) = LOWER('user-address2@domain.com33228184') LIMIT 1
What's confusing to us is the fact that the DB only behaves this way with certain users. From the frontend side, there doesn't seem to be any distinction between responsive accounts and non-responsive accounts. Identical frontend settings, same access rights to different roles and groups.
The backends are not swapping. The frontend is mostly idle at this point.
We've checked DB consistency as well as possible warnings in the Support Data Collector in the OTRS admin area and application logs, with no obvious pointers. We've made sure permissions on the application directory are set correctly, deleted cache and loader cache files.
We've also made sure to set CaseSensitive and SearchCaseSensitive to 0.
What else should we be looking out for?
Thank you
EDIT: We've made sure the DB collation is consistent (UTF-8). We needed to execute Maint::Database::MySQL::InnoDBMigration after the initial DB import, which converted around 20 tables.
Thankfully the new setup is not PROD yet so we don't have to worry about breaking the new setup during debugging.