Touhou-Project.com

Dirty Work

Added 2022-02-21 00:51:14 +0000 UTC

Hey all, hope you’ve been well. I’ve had my finger in a lot of pies as of late, to the extent that I think that a hungry ghost is haunting me. Bad joke aside, I have been a little all over the place but with good reason.

To get it out of the way, I’ve been working on some changes to the site’s appearance and navigation with mixed results for some time. I keep going between back and forth and wanting to change other things first (that would inevitably have to be changed later), reworking what I already have thanks to other changes, and plain old indecision. I’ll be sure to talk about that at length some other time but that’s not really the focus today.

On that front only a pair of things have been rolled out recently on the site: some simplified/tweaked CSS and I’ve also updated our custom THP font. Both are done in preparation for future work as according to my current plan as web browsers tend to cache stylistic elements. So, when (html) structural changes happen, the browser will already have the rules in memory and the site won’t appear to be “broken” for a returning user who doesn’t do a hard refresh. Of course, there will be more CSS stuff that will be needed, but it’s nice to minimize potential disruption.

And that’s a good segue for the meat of what I’ve been testing and rolling out as of late.

As you may know, along with a moderation staff, THP has several anti-spam and anti-disruption features in its codebase. Some have their origins from the original Kusaba X code and some have been added by me over the years. It’s not a perfect system, but it does catch a lot of unwanted activity and does make (especially my) life easier.

I limited sometime last year the ability of users to bump a story thread. Basically, unless an author comes and marks a post as an update, a thread is de facto locked. For writers that don’t use a trip, there’s a little wiggle room with the cutoff but, eventually, the thread will be locked if it’s not bumped.

A little while ago I extended that soft locking to all threads, basically giving an old thread an amount of time since its last post before it can no longer be replied to. I toyed with the idea of limiting that only to threads on “story” boards but decided that there’s really very little reason why a thread that hasn’t been bumped in, like, 2 years in /blue/ ought to be revived. A few incidents every now and again with old threads (which I usually end up deleting before anyone notices) are an unnecessary pain to deal with, anyhow.

And because I’d like to avoid unnecessary pains if at all possible—as I’m the one that usually catches any spam—I got around to doing a few other things of the same vein. Now if a post with an image is reported by a user it no longer displays on the front page. Implementing that wasn’t as straightforward as I liked as it involved mucking about with more complex SQL queries since the tables for posts and reports are separate in the database (and very little is normalized but that’s a gripe for another time!). In order to avoid having too few images show up, I overshot the number requested and offloaded some of the work to the templating part of the page generation. Overall, I was careful not to compromise some of the performance gains with the recent changes to the front page and kept code complexity to a manageable amount as well.

Continuing on the theme of making my life easier, I also did some work with the “view bans” portion of the site, making some of the more automatic bans more verbose as to why they happened and limited some of the displayed text of a deleted post. Interestingly, I realized that some of the old code had a bug that would have allowed for a potential exploit due to an unsanitized input. There’s probably more subtle things like that in the site’s code and so it’s nice to catch something serendipitously.

I also enforced progressively harsher automatic bans for some cases of spam. This already existed (ie: you trip the spam filter and you get banned for an hour, then two, etc) but some of the other ways of catching spam still used fixed values. As the progressive system has been working reliably well in most cases, I saw no reason not to extend it to all cases.

Lastly, I greatly expanded some of the anti-flooding protections on the site. Even though manual moderation catches most things reasonably quickly, determined spammers can make things unpleasant. So creative solutions are needed. I’m not going to be too specific about actual rates and so forth but suffice it to say that these measures mainly target “new” posters.

I first thought of using a “points” system to help deal with the issue. It’s a pretty simple concept: an amount of points is assigned to a (new) user for every kind of actual, say 3 for making a thread, 2 for an image post, 1 for a regular post. If, in a period of x hours, the value exceeds what would be expected of an average user, the user is either banned or made unable to post.

The problem with that is that spammers usually switch their IPs to get around things like that and to make themselves more difficult to ban. Of course, there are third party pages and services that can help by analyzing messages and known bad actors but I’m currently not willing to either trust others or pay for these services.

So, instead, I went with something that feels like more than nothing without it risking disrupting normal users. To put it simply, I expanded some of the checks for new and old users and tried to make it only go down the full set of checks if activity is persistently suspicious. Say that, suddenly, several new threads are made on THP. Maybe it’s a contest and those are legit threads. Still, a single new IP won’t create very many of those by themselves. Nor will a bunch of other new addresses.

Likewise, it’d would be unusual if a new poster suddenly made too many posts in a few hours or uploaded that many images. By taking into account the overall posting rate and past behavior by posters, I’ve set off some standards that will see potential abusers be unable to post anything for a while. It’s not really an automatic ban as such (though a message will show up to any potential human caught by mistake) but more of a mechanism to give moderators a chance to look over the site and prevent whole board pages or threads from being defaced.

I’m still thinking about some of the values I used and whether or not to make some cases also bans but for the most part I’m okay with how my limited testing played out. There’s room for other improvements such as whitelists and more scrutiny for “old” users under certain circumstances but that can wait until more pressing issues are dealt with.

So, other than dealing with potential trouble and playing around with structural stuff (and pushing some bugfixes while I was at it), the third, err, pie would have been looking into self-hosting a Matrix instance for THP. The idea would be to have an alternative to Discord that would be fully controlled by the community instead of beholden to Discord’s rules and ToS as it’s hard to say if and when the rug could be pulled out from under us.

I’ll have plenty to say about that in a future post as well but I did encounter a few potential issues that would have to be sorted out before I went ahead and deployed an instance. It’s mostly fine and doesn’t look that complicated to manage and I expect to look forward to more practical testing as soon as time permits.

Still, I think this post has gone long enough for now. I’m not quite sure what I’ll be working on next, as I did want to work on something else to get it ready in time for the site’s birthday. Until next time, take it easy!