Troubleshooting Graylog Problems
Scenario: Queue size is big, no messages shown, ElasticSearch shows flood warnings
What’s happening
Graylog has a journal — a queue where incoming messages wait to be processed before being written to ElasticSearch. Messages are kept for a maximum of 12 hours or 5 GB, whichever comes first. The threshold values can be changed in graylog.conf:
message_journal_max_age = 12h
message_journal_max_size = 5gb
The flood warnings are ElasticSearch’s doing. ES monitors disk usage and enforces three watermarks:
| Watermark | Default | Effect |
|---|---|---|
| low | 85% | ES stops allocating new shards to the node |
| high | 90% | ES starts relocating shards away from the node |
| flood stage | 95% | ES marks all indices read-only |
When the flood stage is hit, ES sets index.blocks.read_only_allow_delete = true on every index. Graylog can no longer write messages, so they pile up in the journal. The journal grows. Eventually it hits its own size or age limit and starts dropping messages.
Fix
Step 1. Free up disk space on the ElasticSearch nodes — delete old indices, remove unnecessary files, or expand storage.
To see which indices exist and how large they are:
curl -X GET "http://localhost:9200/_cat/indices?v&s=store.size:desc"
To delete an old index:
curl -X DELETE "http://localhost:9200/<index-name>"
Step 2. Once there’s enough free space, remove the read-only block from all indices. ElasticSearch does not lift this automatically.
curl -X PUT "http://localhost:9200/_all/_settings" \
-H "Content-Type: application/json" \
-d '{"index.blocks.read_only_allow_delete": null}'
Step 3. Verify Graylog resumes writing. The journal size should start dropping as the backlog drains. Check the journal state in System → Overview in the Graylog web interface.
If Graylog is still stuck after lifting the read-only block, restart the Graylog service to force it to re-establish its ES connection:
sudo systemctl restart graylog-server