SamuZai
Touhou-Project.com
Touhou-Project.com

patreon


Step By Step (2)

Hey everyone, hope all’s been well with you! I have had something of a mixed of a track record in regards of getting things done over the past few weeks mainly due to life insisting that it can’t be ignored. That said, I’ve still been plugging away during weekends and whenever I’ve found free time to roll out improvements and changes.

It’s been a gradual thing for sure but, like last time, I think it’s helpful if we split up things into categories again.

Metadata

In a previous post some time ago, I mentioned how the site was implementing some of the currently-relevant metadata implementations like Schema. Basically, information about information, that helps things like search engines and other platforms parse the relevance of a page of content and display it to their users.

I’ve since then expanded upon that initial foray. Pretty much every page on the site now contains relevant metadata with story threads reflecting their authors, summary, any associated images and whatever else. I also added other formats like OpenGraph into the mix, to make sure that no matter which search engine crawls the site or whatever platform scans its content can make sense of things and increase THP’s visibility.

Now, obviously, this isn’t a silver bullet as such and its effects are hard to clearly delineate but, hey, at least it’s neat to get properly-generated thumbnails when passing a link to a friend.

As you might imagine, hacking all of this into place wasn’t exactly the simplest of things and calling up tags, relevant info in cases it’s a non-story thread or just a board page required partially rebuilding some of the page generation systems as well as making sure that the strict syntax demanded by these standards was always observed. Things like special characters can break things easily as can a malformed field or one with the wrong scope set. This is certainly the sort of thing that’s easier when you build a website from the ground up anticipating the integration with the semantic web.

… I still don’t quite get the difference between certain fields and approaches as documentation is not great and the big indexers don’t really share how their algorithms rank things for obvious reasons. But, well, at least we can now all enjoy machine-targeted fields like itemprops in our source code (and a few instances of missing alt text on images).

Separating data

One of the things that I’ve done since the last post is continue the trend of cleaning up useless data and optimizing the database. While we’re not quite yet at a point where I can say that the organization of things is not stupid, we’re at least heading into that direction slowly.

Part of this effort is exemplified by moving some of the data in the main posts table into its own separate place. A row of data on a post has historically included 30-odd fields with the logical—such as post number and message—to the more head scratching—like whether or not a post is a sticky or locked even if it’s otherwise a reply, making that information irrelevant. All that information not only takes up space in a database (even small columns can be a few MBs over hundreds of thousands of rows) but makes understanding the data more difficult.

It’s one of my primary goals with all of this to disentangle as much information as possible wherever I can from these large monoliths. It’s way easier to do more meaningful changes to the site, not to mention build up a new structure, when you don’t have to account for everything but the kitchen sink. The added efficiency in operations is a nice bonus too.

So, I decided to split off the stickied and locked post information into their own normalized table that would only be called in when relevant. Easier said than done! A lot of the site’s code not only assumes the presence of this vestigial data but, in places, uses it as a hacky way to check against that other data is present.

It was easy enough to puzzle out a concept that I wanted to implement. The database part was simple too. I created a simple table following a logical normalization scheme. Implementing the management side of things also seemed simple—at first. I redid part of the main function and then realized that the original developers made a bunch of duplicate code for slightly different cases. So, naturally, I figured I had to refactor the code. What was supposed to only take an hour, ended up taking most of a day. But in the end I also did the locked threads functionality so it was time well spent.

The next problem was more complex and had two main thrusts: first, strip out the old ways of inserting the relevant data and replace it with a new way and, second, changing the way pages are generated so that an arbitrary combination of stickied and non-sticky threads are properly pulled in for each page as the boards are being generated.

This took a whole weekend to sort out properly as there’s a surprising amount of edge cases to consider when pages are built up. My initial simple design proved inadequate when testing and so I had to gradually make it more complex while partially rewriting other older parts of code so that they no longer expect data to be presented in that specific way. I know that this is a fairly boring description of things but the point I’m trying to make is that all of that work was necessary; it’s a net positive to have done all those rewrites and tests as future iterations or features will take less effort to implement.

I didn’t have enough time to extend the process to image handling but that’s an area that I want to get around to at some point. I’ll obviously talk more about that when I get to it but it may not be soon as other bits of code need to be in place as well—my thinking is that I might as well expand on features while I’m at it.

More fixes and miscellanea

As always, I have also worked in a bunch of fixes, including for things that users have pointed out in the various chats. Even simple bugfixes can lead to improvements like how an issue with editing posts also saw me rewrite and improve how links are parsed generally. A few CSS touch-ups that fixed awkward-looking threads in certain conditions also followed. This is unsexy stuff for sure but it’s still satisfying to get right. I could go on and on about the minutiae of polishing up the niche RSS generation for special cases but I think I’ve made my point.

Likewise, a few minor things involving the front page were also fixed. A couple of edge cases involving the bumped/updated threads not showing up as they should were sorted. Also images that have been removed from posts should also no longer show up in the front page carousel.

Some moderator-only improvements also made the cut. For example, the menu for administrating the front pages were made less tedious to work with and the underlying code was cleaned up. The eternal war against spam also got some of my attention and I’ve put in a few clever things that seem to already have nicely caught a bit of unwanted activity. I’ll likely talk about that ongoing process more in the near future as there are some partially-finished features that I’ll likely be working on in the coming weeks.

There’s still more to come!

Once more, this port has gotten longer than I first intended. I haven’t even spoken up about the Matrix stuff yet and, boy, what a trip that’s been!

I hope to not take as long to finish up some of the work-in-progress stuff and find the energy to write up another one of these posts. I’m optimistically hoping to get things done before the end of the year and I’ll hopefully share details of all that with you very soon.

Until next time, take it easy!

Comments

Good to see more progress happening. That stuff with the stickies sounded really trying. Looking forward to hearing more about Matrix stuff.

Benjamin Oist


More Creators