Otrs6 with periods of CPU at 100%

Moderator: crythias

Post Reply
jlima
Znuny newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
Znuny Version: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Otrs6 with periods of CPU at 100%

Post by jlima »

Hello

I have the same problem as described here (viewtopic.php?f=53&p=166163#p166163) , sudden 100% CPU spikes on all four CPUs that can last for 15min.
As you can see, I have several Daemon.pl as well as /opt/otrs/bin/c -DFOREGROUND processes choking the server.

I don't know what they're doing nor how to dig it out.

/var/log/httpd/error_log has several entries like these but I don't know if there's any connection
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/AgentTicketSearch.pm line 1306.
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/CustomerTicketZoom.pm line 586.
Use of uninitialized value $CustomerData{"UserFullname"} in concatenation (.) or string at /opt/otrs//Kernel/Output/PDF/Ticket.pm line 258.
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/AgentTicketQueue.pm line 190.

Any help would be greatly appreciated.

Thanks,
Jorge

Image
OTRS 6.0.18 (public/testing) on CentOS with Postgres database
root
Administrator
Posts: 3934
Joined: 18 Dec 2007, 12:23
Znuny Version: Znuny and Znuny LTS
Real Name: Roy Kaldung
Company: Znuny
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root »

Hi,

Next time when the load is about 100% check what the output of bin/otrs.Console.pl Maint::Daemon::Summary is and let us know.

- Roy
Znuny and Znuny LTS running on CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Use a test system - always.

Do you need professional services? Check out https://www.znuny.com/

Do you want to contribute or want to know where it goes ?
jlima
Znuny newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
Znuny Version: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima »

Hi Roy
thanks for your reply
Here follows the Daemon Summary plus screen captures of htop and top
Hope they provide any light on this
Thanks
Jorge


Gathering summary for all daemons...

System configuration sync:
Daemon is active.

Tasks to be executed in future:
There are currently no tasks to be executed in future.

Recurrent cron tasks:
+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-03 00:20:00 | Success | 50.3 Minute(s) | 2019-11-10 00:20:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-04 09:05:00 | Success | < 1 Second | 2019-11-04 10:05:00 |
| LoaderCacheDelete | 2019-11-03 00:30:00 | Success | 2.0 Second(s) | 2019-11-10 00:30:00 |
| MailAccountFetch | 2019-11-04 09:20:00 | Success | 2.2 Minute(s) | 2019-11-04 09:30:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-04 09:15:00 | Success | < 1 Second | 2019-11-04 09:35:00 |
| OTRSBusinessEntitlementCheck | 2019-11-04 08:45:00 | Success | < 1 Second | 2019-11-04 09:25:00 |
| RegistrationUpdateSend | 2019-11-04 08:30:00 | Success | < 1 Second | 2019-11-04 09:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-04 02:02:00 | Success | < 1 Second | 2019-11-05 02:02:00 |
| SessionDeleteExpired | 2019-11-04 08:55:00 | Success | < 1 Second | 2019-11-04 10:55:00 |
| SpoolMailsReprocess | 2019-11-04 00:10:00 | Success | < 1 Second | 2019-11-05 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-04 09:01:00 | Success | < 1 Second | 2019-11-04 10:01:00 |
| TicketAcceleratorRebuild | 2019-11-04 01:01:00 | Success | 45.0 Second(s) | 2019-11-05 01:01:00 |
| TicketPendingCheck | 2019-11-04 08:45:00 | Success | 5.0 Second(s) | 2019-11-04 10:45:00 |
| TicketUnlockTimeout | 2019-11-04 08:35:00 | Success | < 1 Second | 2019-11-04 09:35:00 |
| ArticleSearchIndexRebuild | 2019-11-04 09:23:00 | Success | 2.9 Minute(s) | 2019-11-04 09:24:00 |
| CommunicationLogDelete | 2019-11-04 03:00:00 | Success | 2.4 Minute(s) | 2019-11-05 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-03 00:40:00 | Success | 19.0 Second(s) | 2019-11-10 00:40:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-04 03:02:00 | Success | < 1 Second | 2019-11-05 03:02:00 |
| MailQueueSend | 2019-11-04 09:23:00 | Success | 2.2 Minute(s) | 2019-11-04 09:24:00 |
| TicketDraftDeleteExpired | 2019-11-04 08:55:00 | Success | < 1 Second | 2019-11-04 09:55:00 |
| TicketNumberCounterCleanup | 2019-11-04 09:20:00 | Success | < 1 Second | 2019-11-04 09:30:00 |
| WebUploadCacheCleanup | 2019-11-04 08:46:00 | Success | < 1 Second | 2019-11-04 09:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+

Recurrent generic agent tasks:
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+
| tickets novos com mais de 3 dias | 2019-11-01 16:50:00 | Success | 1.0 Second(s) | 2019-11-04 16:50:00 |
| ___Admin nao novos | 2019-11-04 05:00:00 | Success | 1.0 Second(s) | 2019-11-05 05:00:00 |
| ___Unlocked tickets | 2019-11-04 05:15:00 | Success | < 1 Second | 2019-11-05 05:15:00 |
| move webrequests com mais de 30 min para entrada | 2019-11-04 09:00:00 | Success | < 1 Second | 2019-11-04 10:00:00 |
| spam qq.com | 2019-11-04 03:30:00 | Success | 16.0 Second(s) | 2019-11-05 03:30:00 |
| apagar spam | 2019-11-04 00:00:00 | Success | 1.0 Second(s) | 2019-11-05 00:00:00 |
| arquivar tickets | 2019-11-02 04:10:00 | Success | 6.0 Minute(s) | 2019-11-09 04:10:00 |
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+

Unhandled Worker Tasks:
+---------------+------+---------------------+
| Name | Type | Create Time |
+---------------+------+---------------------+
| MailQueueSend | Cron | 2019-11-04 09:23:09 |
+---------------+------+---------------------+

Handled Worker Tasks:
+---------------------------+------+-----+----------+---------------+
| Name | Type | NID | PID | Duration |
+---------------------------+------+-----+----------+---------------+
| ArticleSearchIndexRebuild | Cron | 001 | 00002284 | 9.0 Second(s) |
+---------------------------+------+-----+----------+---------------+

Done.

Image
Image
OTRS 6.0.18 (public/testing) on CentOS with Postgres database
root
Administrator
Posts: 3934
Joined: 18 Dec 2007, 12:23
Znuny Version: Znuny and Znuny LTS
Real Name: Roy Kaldung
Company: Znuny
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root »

Hi Jorge,

I would check more on the I/O, even on my largest systems the CoreCacheCleanup never takes more than 3-4 minutes. 50 minutes seems pretty much to me, especially when you consider the date and time when the task is executed (Sunday night). My guess is based on the fact that there are many, small files which were deleted.

- Roy
Znuny and Znuny LTS running on CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Use a test system - always.

Do you need professional services? Check out https://www.znuny.com/

Do you want to contribute or want to know where it goes ?
jlima
Znuny newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
Znuny Version: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima »

Hi Roy
thanks for the hint. Will ask the guy responsible for the infrastructure to check it out.
But, why isn't it permanent? Why does it happen at random times with no specific pattern?
Thanks
jlima
OTRS 6.0.18 (public/testing) on CentOS with Postgres database
jlima
Znuny newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
Znuny Version: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima »

Hi everyone,
Some news here
First, no evidence of IO issues.
What we found was that CoreCacheCleanup was running at the same time of the system backup. We moved it so it won't overlap. Let's wait for this week's cleanup to see the difference.

Additionally, I found an article that induces the reported behavior. So far, it seems identical to every other article except that in article_data_mime_attachment there are attachments that aren't displayed by otrs.
Questions
1: why aren't these attachments (2 jpg and 1 gif) reflected on otrs (neither on the bottom of the article nor on the right side under the clip column?
2: could all that cpu be related to this?
3: has anyone experienced this?
4: what might have happened?
5: any suggestion on what to do?

Hope someone can help
best regards
jorge
OTRS 6.0.18 (public/testing) on CentOS with Postgres database
jlima
Znuny newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
Znuny Version: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima »

Hello everyone

following my previous post, I changed the CacheCleanup schedule.
Regarding the cpu pikes, they are less frequent and last "only" for a minute or two.

Apart from that specific ticket, I still haven't found any trigger for what is happening.
I have just revisited one of the two articles, in that ticket, that hangup everything. It topped the four cpus to 100% for 110 seconds and then almost another 30 seconds to display the contents of the article. The other article took almost 4 minutes of 4x100% CPU and 25 seconds additional time.
Don't really know what it is with these two articles. The other articles are ok.
Then I copied (ArticleCopyMoveDelete extension) these articles to a new test ticket and the behavior still problematic.

After two executions of the cleanup here are the results.
CoreCacheCleanup is still taking too much time by Roy's referential.

+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-09 21:00:00 | Success | 8.3 Minute(s) | 2019-11-16 21:00:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-11 09:05:00 | Success | < 1 Second | 2019-11-11 10:05:00 |
| LoaderCacheDelete | 2019-11-09 20:00:00 | Success | 4.0 Second(s) | 2019-11-16 20:00:00 |
| MailAccountFetch | 2019-11-11 10:00:00 | Success | 41.0 Second(s) | 2019-11-11 10:10:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-11 09:55:00 | Success | < 1 Second | 2019-11-11 10:15:00 |
| OTRSBusinessEntitlementCheck | 2019-11-11 09:45:00 | Success | < 1 Second | 2019-11-11 10:25:00 |
| RegistrationUpdateSend | 2019-11-11 09:30:00 | Success | < 1 Second | 2019-11-11 10:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-11 02:02:00 | Success | < 1 Second | 2019-11-12 02:02:00 |
| SessionDeleteExpired | 2019-11-11 08:55:00 | Success | < 1 Second | 2019-11-11 10:55:00 |
| SpoolMailsReprocess | 2019-11-11 00:10:00 | Success | < 1 Second | 2019-11-12 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-11 10:01:00 | Success | 1.0 Second(s) | 2019-11-11 11:01:00 |
| TicketAcceleratorRebuild | 2019-11-11 01:01:00 | Success | 2.1 Minute(s) | 2019-11-12 01:01:00 |
| TicketPendingCheck | 2019-11-11 08:45:00 | Success | 8.0 Second(s) | 2019-11-11 10:45:00 |
| TicketUnlockTimeout | 2019-11-11 09:35:00 | Success | < 1 Second | 2019-11-11 10:35:00 |
| ArticleSearchIndexRebuild | 2019-11-11 10:02:00 | Success | 1.0 Second(s) | 2019-11-11 10:03:00 |
| CommunicationLogDelete | 2019-11-11 03:00:00 | Success | 1.1 Minute(s) | 2019-11-12 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-09 20:10:00 | Success | 9.0 Second(s) | 2019-11-16 20:10:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-11 03:02:00 | Success | 1.0 Second(s) | 2019-11-12 03:02:00 |
| MailQueueSend | 2019-11-11 10:01:00 | Success | < 1 Second | 2019-11-11 10:03:00 |
| TicketDraftDeleteExpired | 2019-11-11 09:55:00 | Success | < 1 Second | 2019-11-11 10:55:00 |
| TicketNumberCounterCleanup | 2019-11-11 10:00:00 | Success | < 1 Second | 2019-11-11 10:10:00 |
| WebUploadCacheCleanup | 2019-11-11 09:46:00 | Success | < 1 Second | 2019-11-11 10:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+


+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-16 21:00:00 | Success | 28.8 Minute(s) | 2019-11-23 21:00:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-20 15:05:00 | Success | 1.0 Second(s) | 2019-11-20 16:05:00 |
| LoaderCacheDelete | 2019-11-16 20:00:00 | Success | 2.0 Second(s) | 2019-11-23 20:00:00 |
| MailAccountFetch | 2019-11-20 15:30:00 | Success | 7.0 Second(s) | 2019-11-20 15:40:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-20 15:15:00 | Success | < 1 Second | 2019-11-20 15:35:00 |
| OTRSBusinessEntitlementCheck | 2019-11-20 15:25:00 | Success | < 1 Second | 2019-11-20 15:45:00 |
| RegistrationUpdateSend | 2019-11-20 15:30:00 | Success | < 1 Second | 2019-11-20 16:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-20 02:02:00 | Success | < 1 Second | 2019-11-21 02:02:00 |
| SessionDeleteExpired | 2019-11-20 14:55:00 | Success | < 1 Second | 2019-11-20 16:55:00 |
| SpoolMailsReprocess | 2019-11-20 00:10:00 | Success | < 1 Second | 2019-11-21 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-20 15:01:00 | Success | < 1 Second | 2019-11-20 16:01:00 |
| TicketAcceleratorRebuild | 2019-11-20 01:01:00 | Success | 31.0 Second(s) | 2019-11-21 01:01:00 |
| TicketPendingCheck | 2019-11-20 14:45:00 | Success | 2.0 Second(s) | 2019-11-20 16:45:00 |
| TicketUnlockTimeout | 2019-11-20 14:35:00 | Success | < 1 Second | 2019-11-20 15:35:00 |
| ArticleSearchIndexRebuild | 2019-11-20 15:31:00 | Success | < 1 Second | 2019-11-20 15:33:00 |
| CommunicationLogDelete | 2019-11-20 03:00:00 | Success | 1.6 Minute(s) | 2019-11-21 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-16 20:10:00 | Success | 8.0 Second(s) | 2019-11-23 20:10:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-20 03:02:00 | Success | < 1 Second | 2019-11-21 03:02:00 |
| MailQueueSend | 2019-11-20 15:31:00 | Success | 1.0 Second(s) | 2019-11-20 15:33:00 |
| TicketDraftDeleteExpired | 2019-11-20 14:55:00 | Success | < 1 Second | 2019-11-20 15:55:00 |
| TicketNumberCounterCleanup | 2019-11-20 15:30:00 | Success | < 1 Second | 2019-11-20 15:40:00 |
| WebUploadCacheCleanup | 2019-11-20 14:46:00 | Success | < 1 Second | 2019-11-20 15:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+

I really hope someone can shred some light on my path ...
cheers
OTRS 6.0.18 (public/testing) on CentOS with Postgres database
root
Administrator
Posts: 3934
Joined: 18 Dec 2007, 12:23
Znuny Version: Znuny and Znuny LTS
Real Name: Roy Kaldung
Company: Znuny
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root »

Hi,

Anything "uncommon" on your setup? Using NFS or sth like that? What happens if you move the cache into an ram disk?

- Roy
Znuny and Znuny LTS running on CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Use a test system - always.

Do you need professional services? Check out https://www.znuny.com/

Do you want to contribute or want to know where it goes ?
Post Reply