SamuZai
Touhou-Project.com
Touhou-Project.com

patreon


Matrix Operations

Hey everyone, hope you’ve been well. It’s been a while since the last post and there’s still a few things that are in the works but not quite production-ready so I’ll be taking the opportunity to talk about the Matrix server instead. It’s a long-overdue topic that feels especially timely since we recently passed the one-year mark of having our own self-hosted Matrix instance.

First, a very brief recap in case you haven’t the faintest idea of what I’m talking about. Matrix is a communication protocol that allows for distributed communications among dispersed service providers. The user-facing part can be, for the sake of brevity, compared to something like Discord or  IRC chatrooms. In other words, for our purposes, it’s a way for the THP community to chat and share images, thoughts and whatever without being beholden to the terms and rules of other services. There’s a webclient that we offer on the site but users can up and connect with whatever client they like best.

On the hosting side of things, the core of it is the homserver software, which implements the Matrix specifications and provides the endpoints for clients to connect to. THP runs Synapse which is the most mature in terms of development but also is relatively resource-heavy when compared to the alternatives. While things are fine now in terms of resources, migrating to something else is always a possibility; I check every now and again the status of alternatives; if we had a large influx of users or decided to federate (more on that later,) an obvious alternative to paying for more server resources could be offsetting via more efficient software.

That said, we don’t really just run the homeserver software. Things like the aforementioned webclient (an embedded version of Element,) the Matrix<>Discord bridge, various bots (for administration, moderation, webhooks,) an email server, automated backups, and a push notification service are among the several other things that also run on the same server. Since keeping track of new versions of this and that and updating things without breaking them becomes tricky the more software that you’re managing, I decided early on to use tools such as ansible playbooks to keep things manageable. The containerized images and scripts of this approach do add overhead and use more resources, sure, but better modularity and interoperability of services and a more centralized configuration method are very big upshots.

I usually space out updates every 2-3 weeks, after major releases of Synapse and Element, and when things go well it’s a very painless process to get things up-to-date; a command to pull the latest version of the playbook followed by up that updates the configured services on the server; if all goes well it takes 10 minutes and requires no intervention on my part. If things go well … they typically do but over the past year I’ve had to deal with my fair share of intervention and outright troubleshooting.

Some of these issues have been a matter of newer formatting and changes in variables, either in the playbook or in the various services. For the former, ansible is helpful in informing what changed and documentation is easy to reference and my local configuration is typically updated in just a couple of minutes and the setup can be re-run successfully after. A large change in the playbook earlier in the year, however, was a bit of a headache. The program which serves as the go-between requests and various services and forwards data around was changed. While I didn’t have to worry too much about the internal changes, I still had to rework the public-facing server software to accommodate for the new format for requests (I had to flesh out a custom reverse proxy system among other things.) Trying to make sure that everything works before rebooting and potentially breaking everything is never a fun experience for a sysadmin; luckily most of the disruption only lasted a few hours and going forward even if they change things internally, I shouldn’t have to tinker with that aspect again.

Issues with the various services are a bit more difficult to deal with. Matrix is something of a moving target; the spec is always evolving and getting added to; homeservers add changes and deprecate old ones. Getting features working like the newer voice chat for Element or support for desired experimental functions requires tinkering and keeping an eye on changelogs. Some of it is minor, like suppressing notifications for message edits, but some is more important for the user experience like staggering loading history when a user first joins, so that the process is quicker. Some of these features get eventually integrated and all is well, sometimes they don’t work out or a competing proposed spec gets merged instead and configurations have to be updated accordingly.

Or, to use a very recent example, a component that we are actively using hasn’t been updated in a while and functionality it depends on gets removed. The Matrix<>Discord bridge hasn’t seen much in the way of development for various reasons in the past year or so (more on that later) and a recent Synapse update removed a way that the bridge had of authenticating itself. Unless you happened to know how the bridge worked and had your eye on all the relevant API parts (dozens, if not hundreds of them,) you would find yourself with a half-functional bridge as we did with a routine upgrade. Unhelpfully, program logs did not contain any data that pointed to this deprecated endpoint and as far as the program was concerned it was sending data correctly—it was just never processed by the homeserver. It took me a full day (that is, actively trying different things from dawn to dusk) of trying to figure out why things were working when I stumbled upon a hint of the problem in a mostly-dead chatroom. Luckily for us, the Synapse devs added a compatibility mode for this deprecated API that can be used by programs. It is considered a small security risk and needs to be manually enabled but it solved the issue. I contributed to the playbook’s documentation afterwards, to spare any other poor souls the same anguish I experienced.

This is a chance for a good segue into why this happened in the first place: the bridge’s developers simply haven’t had the time and resources to keep it up-to-date and add new features. A lot of the rougher points like replies formatting and sharing emojis/reactions across the bridge would be fixed implementing submitted code and upgrading their dependencies. There is simply not enough manpower for it. The core Matrix team (which also sees to this particular bridge) is, in general, understaffed and in the need of more resources. As a result, adoption of things like standards can be excruciatingly slow and some of the more complex issues have taken years to get sorted out.  (It took a lot of forcefulness to finally get custom emojis somewhere close to being implemented.)

You might read the above and think that things are bad for Matrix in general but that’s not really the case either! Large institutions like the German Ministry of Defense and the French Government are adopting Matrix and helping develop it as well. That does skew the priorities for the developers and it means that their limited money and manpower tend to go into more general, stability-related improvements, security features, and seamless integrations across platforms and for different use cases. Touhou fanfic communities, gamers, and general casual users have much less clout. We can help contribute to the open spec-proposal process but hurrying things along is something else altogether. The good news is that after the release of v1.8of the spec last week, the upcoming v1.9 will focus on a lot of things that will be more helpful for mass adoption—things like aforementioned emojis/sticker packs, role-based access, extensible events, and having rooms send events. Development, when it does actually happen, tends to be quick and I’m cautiously optimistic that we’ll have those things before the end of the year.

(I can’t speak for the discord bridge with any certainty, but that might also resume development because a number of people have been grumbling about it.)

Let’s circle back to something I said in passing, about federation. Matrix is like email or Mastodon in that it’s federated and different homeservers and their users can communicate with one another in a decentralized fashion. Federating does use a lot more resources as assets are queried and gotten and databases filled with information about users and rooms on other servers. So that’s a major reason why our little experiment has maintained itself apart from everyone else—I didn’t want to potentially strain things when we’re starting out. But the resource part of the equation isn’t the whole of it and there’s another big reason why I wouldn’t currently federate: the subpar quality of administration tools.

Dealing with spam, potential abuse, and whatever else is the bane of any server or website administrator. I’ve gone on and on about my experiences with THP in these posts and, obviously, Matrix won’t be exempt from this concern. To put it succinctly: antispam bots and modules are fairly barebones and getting robust protection requires a lot of fiddling and time investment. It isn’t just a matter of ticking a few boxes or adding a line or two to a configuration file. THP is a small community and even with the volunteers that do help out, we’re not exactly in shape to deal with routine or persistent abuse. Maybe the situation will improve in the not-too-distant-future and better moderation tools will also arise but, for the time being, we’re staying isolated.

The admin-friendliness also needs to extend to a few other areas beyond moderating as too much of the nitty-gritty of the homeserver is done through POSTs directly to the API. Reading the documentation and figuring out what it is that you want to do, getting the relevant ids for events or users, and setting the scope of changes is often time-consuming. As a long time tech nerd I’m more than familiar with cURL but it’s still a real pain in the ass to type out and adhere to strict syntax for what should be simple operations.

The API is generally good and interesting things can be done with them. Which leads me back to Touhou and THP again. There are some cool integrations that I have planned with the site as we continue to overhaul things bit by bit. I’m not sure if I’ll go full madman and start working towards my own custom bots and whatever else but there’s certainly things that are plausible even with less effort.

I could gas on much more about Matrix but I think I’ve gone on too long as is, so I’m leaving things here. The next time I write one of these posts, I hope to talk more about THP and the other stuff I’m working on. But still, hope you enjoyed this writeup and seeing a little behind the scenes of all the Matrix stuff. There was an Element release just today and Synapse is looking to get one later this week or next week at latest. Things ought to work just fine, as they usually do, but if they don’t, I’ll be on top of things.

Until next time, take it easy!


More Creators