Forum down for hours

Upgrading discourse is easily done via UI and since several months we never stumbled over an issue although they publish updates frequently. But this time our backups came in handy. So it might be that you need to repeat some questions or answers, I’ll also have a look and restore posts manually if I see some missing.

We were fighting with discourse for many hours now but could not fix the issue with the new version
we were having (topics did not open and the request timed out) despite having help from Jeff the founder and others :).

And so we reverted to the previous version v1.6.0.beta1 and imported the backup, while that I investigated also why our backups are so fat but this didn’t help and learned a bit more about docker and even a postgres power query. Finally it was just a missing index, maybe a hickup in the upgrade or a missing migration, not sure.

So it was quite a bit of a journey and I hope this is the end of it :wink:

Sorry for any inconvenience!

Update: issue about this: https://github.com/graphhopper/dirigent/issues/339

After a bit sleep :wink: I investigated this a bit further and it looks like we didn’t loose answers or even topics. I would count this as success and plus point for discourse.

If you observe something different e.g. your private messages, questions or answers are no longer present here or you have something in your Email inbox that is not present here, please let us know!

Here is what we can do better:

  • do not upgrade on Friday :wink: but not because there is stress as you want to go into the weekend, but Friday is not perfect as the open source community seems to be a bit more active on weekends (although I did not measure it)
  • the time itself was rather okayish (evening) because fighting with backups ‘over night’ is not that disturbing for the community
  • move to stable releases for discourse, currently we are on ‘betas’. Still switching to stable is not straightforward
  • instead of trying to switch versions or fresh installs I should have looked earlier into the logs via (./launcher logs app ->is there a tail -f version?)
  • from time to time ensure backups are working
  • future: maintain a failover server or switch to the discourse cloud service?