12.04 Troubleshooting Common Issues
Diagnostic playbook for the most common SimpleRisk operational issues — install errors, login problems, cron jobs not firing, slow performance, broken email, encryption-related issues, integration failures. Each issue includes likely causes and the specific commands or settings to check.
Why this is a reference article
This article is a runbook. When something breaks, find the symptom in the table of contents below, jump to the section, work through the diagnostic steps. The patterns recur; once you've debugged each one once, the second time is fast.
Issue 1: Install fails with "Database connection error"
Symptom: the installer can't connect to the database; the install can't proceed.
Likely causes (in order of frequency):
- Wrong credentials in
simplerisk/includes/config.php. Verify by trying the credentials manually:mysql -u.-p -h - Database server unreachable. Check network:
nc -zvshould succeed.3306 - Database user doesn't have access from the SimpleRisk server's IP. Check
SHOW GRANTS FOR '.'@' '; - Database doesn't exist. The configured
simpleriskdatabase needs to exist. Create withCREATE DATABASE simplerisk;. - MySQL version too old. SimpleRisk requires MySQL 5.7+. Check with
mysql --version. (MariaDB is no longer supported — see System Requirements.)
Issue 2: User can't log in (correct credentials)
Symptom: user enters correct username and password, login fails.
Likely causes:
- Account is locked (failed-attempts threshold reached). Check
SELECT lockout FROM user WHERE username = '. If'; 1, unlock via User Management orUPDATE user SET lockout = 0 WHERE username = '.'; - Password expired (password policy enforces expiration). Force a password change via the admin reset path.
- MFA misconfigured (user enrolled but lost device). See Multi-Factor Authentication recovery procedure.
- SSO misconfigured (the user is
type='ldap'ortype='saml'and the SSO path is broken). Test SSO directly; check the Authentication Extra's settings. - Session storage is broken (database can't write
sessionsrows). Check database disk space and table integrity.
Issue 3: Cron jobs aren't firing
Symptom: workflows don't fire, notifications don't send, AI jobs queue indefinitely.
Diagnostic steps:
- Check cron daemon is running:
systemctl status cron(or your distribution's equivalent). - Check the SimpleRisk cron entries:
crontab -l(under the user that runs SimpleRisk's cron). - Check
cron_historytable: most recent entries should be within the expected interval. - Check the debug log: cron job runs log entries; failures log errors. Filter for the specific job's output.
- Run the cron job manually:
php /path/to/simplerisk/cron/cron_notification.php(or the relevant script). Observe the output for errors. - Check filesystem permissions: the cron user needs read/write access to
simplerisk/.
For the cron queue worker specifically: check that cron_queue_worker.php is running. On some installs it runs continuously; on others it's invoked by cron.
Issue 4: Slow page loads
Symptom: pages take > 5 seconds to load.
Diagnostic steps:
- Identify which pages are slow: all pages, or specific ones?
- Check PHP opcache:
php -i | grep opcache.enableshould showOn. See Performance Tuning. - Check PHP-FPM worker status:
pm.max_childrensaturation produces queueing. - Check MySQL slow query log: enable, capture queries from a slow page load, identify the culprit.
- Check server resources:
top,free -h,df -h. Saturated CPU, OOM, or full disks all produce slowness. - Check database server resources if separated.
For specific slow operations, see Performance Tuning.
Issue 5: Notifications not arriving
Symptom: SimpleRisk should send an email, the email doesn't arrive.
Diagnostic steps:
- Check SMTP configuration:
Configure → Settings → Mail Settings. Send a test email; observe. - Check the test email succeeded: if not, the SMTP configuration is wrong.
- Check the Notification Extra is active:
Configure → Extras → Notification Extra. - Check the notification cron is running:
cron_notification.phpwrites to the debug log on each run. - Check the notification queue:
notification_sent_logtable for queued-but-not-sent entries. - Check the recipient's spam folder: deliverability issues sometimes look like non-delivery.
- Check the SMTP service's delivery logs: bounces, suppressions, rate limits.
For SMTP-side issues, see Email and the Notification Extra.
Issue 6: Encrypted data displays as ciphertext
Symptom: risk subjects, descriptions, etc. display as long base64-looking strings instead of readable text.
Likely causes (when the Encryption Extra is active):
- Master key file missing or wrong on the server handling the request. Check
simplerisk/extras/encryption/includes/init.phpexists and contains the expected key. - Multi-server deployment with key file inconsistency. One server has the right key; others don't. Check all servers.
- Master key file corrupted. Restore from your backup of the key file.
Without the master key, the data can't be decrypted. See Key Management and Rotation.
Issue 7: Workflow / AI jobs queued but not processing
Symptom: workflows fire (workflow_executions row created) but never complete; AI jobs queue but don't return results.
Diagnostic steps:
- Check the cron queue worker is running:
cron_queue_worker.phpshould be invoked regularly (or running continuously, depending on the install). - Check
workflow_executions.status: stuckpendingindicates the worker isn't picking up jobs;failedindicates execution errors. - Check the debug log: worker errors log there.
- Check the worker's process:
ps aux | grep cron_queue_workershould show it running (if continuous mode). - For AI specifically: verify the AI provider's API is reachable and the API key is valid (test via curl).
Issue 8: Database disk filling up
Symptom: monitoring alerts on database disk approaching full.
Diagnostic steps:
- Identify the largest tables:
SELECT table_name, ROUND((data_length + index_length) / 1024 / 1024) AS size_mb FROM information_schema.tables WHERE table_schema = 'simplerisk' ORDER BY size_mb DESC LIMIT 10;. - Common offenders:
audit_log,debug_log,sessions(if not cleaned up),notification_sent_log. - For audit/debug logs: define and apply retention policy.
- For sessions: ensure session GC is running; manually purge old:
DELETE FROM sessions WHERE access < UNIX_TIMESTAMP() - 86400;. - Coordinate with operations: increase storage, archive old data, or both.
Issue 9: Integration receives 401 / 403
Symptom: API integration that worked yesterday returns 401 or 403 today.
Likely causes:
- API key revoked or rotated without updating the integration.
- User account disabled (the user the key belongs to).
- API toggle disabled:
Configure → Settings → API→ checkEnabled. - Permission revoked for the user (returns 403).
- Hitting a path the user can't see (team filtering returns 403 or 404).
Check the SimpleRisk debug log for authentication failure entries; check the user's status; verify the key against current api_keys.
Issue 10: Upgrade fails partway
Symptom: SimpleRisk upgrade flow fails mid-process; install left in inconsistent state.
Diagnostic steps:
- Read the upgrade error message carefully. Most upgrade failures are specific (database constraint, missing PHP extension, schema change conflict).
- Check the upgrade function in
simplerisk/includes/upgrade.phpfor the version that failed; trace the specific operation. - Restore from backup if data integrity is at risk. See Database Backup and Restore.
- Re-run the upgrade after fixing the cause; SimpleRisk's upgrade flow is idempotent in most cases.
- Contact support if the upgrade can't be made to complete; see When to Contact Support.
Issue 11: SAML / LDAP authentication broken
Symptom: SSO users can't log in; local users still can.
Diagnostic steps:
- Check the Authentication Extra is active:
Configure → Extras → Authentication Extra. - Test SAML / LDAP configuration: the Extra exposes test buttons; use them.
- Check the IdP side: SAML metadata expired? LDAP service account credentials rotated? Network connectivity to the IdP?
- For SAML: certificate expirations are common. Check
SAML_METADATA_URLreturns valid metadata. - For LDAP:
test_ldap_configuration()(via the Extra's UI) reports the specific failure. - Check the debug log for SSO-related error entries.
If users are locked out entirely (SSO broken, no local accounts), the recovery path is to log in as the original local admin (whose credentials predate SSO activation).
Issue 12: Custom fields disappeared after upgrade
Symptom: after upgrade, custom fields aren't appearing on forms.
Diagnostic steps:
- Check the Customization Extra is still active: upgrades occasionally deactivate Extras; reactivate.
- Check
custom_fieldstable: the field definitions should be present. - Check the form layout template in
custom_template: layout may need to be re-saved. - Reload the page: browser cache may be serving an old form.
Custom fields rarely disappear on upgrade; if they do, restore the Customization Extra's data from backup.
Generic diagnostic checklist
For any issue not covered above:
- What changed recently? Software upgrade, configuration change, infrastructure change, traffic shift.
- What does the debug log say? Filter by time range around when the issue started.
- What does the audit log say? Specific to entity-state issues.
- What does the web server log say? PHP errors, 5xx responses.
- What does the database log say? Slow queries, connection errors, errors.
- Can you reproduce in non-production? If yes, debug there.
- Does the issue affect all users or specific ones? If specific, what's different about them?
- Does the issue happen at specific times? If yes, correlate with cron schedules, backup windows, traffic patterns.
Common pitfalls
A handful of patterns recur with troubleshooting.
-
Restarting before debugging. Restart often hides the cause; debug first, restart after.
-
Trying multiple fixes simultaneously. When something works, you don't know which fix did it. Fix one thing at a time.
-
Trusting user reports over evidence. "It's broken" needs a reproducer. Get specifics before debugging.
-
Skipping the logs. The logs usually contain the answer. Always check them first.
-
Treating intermittent issues as fixed when they don't repeat for an hour. Intermittent issues recur. Document and continue investigating.
-
Not coordinating with the program team. A "just trying to fix this" change in production during a workflow can disrupt users.
-
Not having a rollback path. Before any production change, know how to undo it.
-
Forgetting to monitor after the fix. A fix that "works" then breaks again two hours later wasn't actually a fix.
-
Documenting fixes only in chat. Future operators (and you in 6 months) will need to find this. Document properly.
-
Not asking for help when stuck. When to Contact Support.