I’ve continued working hard this month to bring Comic Rocket up to current standards. I also rolled out some subtle new features to improve the quality of comic listings and protect reader privacy. In this post, I’ll describe all that effort in detail.
As a reminder, you can help keep this work going by supporting me on Patreon. This month we passed US$200/month in pledges! It’s been so gratifying to see, in this particularly tangible way, just how many people care about Comic Rocket.
Please consider contributing whatever you can. Even the smallest pledges give me so much joy. Also, I currently have 68 patrons, and the part of me which is too online really wants one more.
Higher quality listings
I did two big things this month to improve your experience with Comic Rocket. Neither one is flashy or obvious but both should help a lot over time.
First, I built tools for finding duplicate comic listings and semi-automatically merging them. This is complicated because, for example, if they have different descriptions, which one should the merged listing use?
These tools aren’t available for everyone because it’s easy to lose important data if you aren’t careful. But a handful of volunteers and I have cleaned up most of the duplicates. Hopefully going forward we’ll be able to take care of new duplicates quickly.
The merge tools also helped me fix a long-standing issue. This month I converted all ComicFury listings, which used to come from a special integration, into listings which use the same crawler as everything else. In quite a few cases, people added duplicate crawler-based listings for the same comics. So I needed the merge tool to clean up the last vestiges of that integration.
HTTPS for everyone
The other important but subtle change is that Comic Rocket now forces sites to HTTPS whenever that appears to be safe. I explained why HTTPS is important for comics a couple months ago. What’s new here is in two parts: a list of “safe” sites, and the ability to rewrite links. You can’t see either one directly; they’re just quietly working behind the scenes.
tl;dr: In many cases, if you read webcomics using Comic Rocket, it protects your privacy better than if you visit them directly. Also, I’m told that if you have a WordPress site, the Really Simple SSL plugin takes care of all the following issues for you.
Rollout of HTTPS was slow from its invention in 1994 until the Let’s Encrypt project opened to the public in 2016, but accelerated dramatically since then. Today, over 80% of web page views use HTTPS, compared to under 30% in 2014! Many sites aren’t using it optimally, though. This is true of webcomics just as much as anyone else.
Many of the webcomics available via HTTPS are also still available without it, and don’t tell you that you could be using a secure version. On these sites, if you start on an insecure page, the rest of your visit will be insecure too. I’ve even seen webcomics where you can start on a secure page and end up on insecure pages after clicking a few links.
Security experts and search-engine optimization experts have always agreed that if you offer HTTPS and somebody lands on an insecure version of your site, you should redirect them to the secure version.
But since not everyone does this, the privacy-focused folks at the DuckDuckGo search engine created their Smarter Encryption list. That’s a list of sites where you can safely rewrite “http://” into “https://” and get a secure version, even if the site doesn’t tell you so. At this point the list covers over 30 million web sites. (!)
Web browsers maintain much smaller “HSTS Preload” lists for related reasons. Mozilla’s list only has about 130 thousand entries, but it includes “wildcard entries” which cover many sites. So for example Mozilla’s list has one entry for everything under tumblr.com, while the Smarter Encryption list approaches a million separate entries to cover each individual Tumblr account.
How Comic Rocket now improves your privacy
Comic Rocket now pulls from Mozilla’s HSTS Preload list, plus a filtered version of the Smarter Encryption list. I filter out anything already covered in Mozilla’s list, like Tumblr, and then keep only sites which correspond to webcomics we have listings for. That currently cuts the Smarter Encryption list down to 1,475 sites out of the original 30 million.
I’ve also manually added a few hosting sites not adequately covered by either of those lists. Currently ComicFury is the only significant one.
The last part is to actually use these lists. Comic Rocket’s crawler now automatically rewrites links it finds to any of these sites so that they’re always HTTPS links. That protects you whether you’re using Comic Rocket’s default framed navigation, the Android app, or the bookmarklets.
I need to go refresh older pages that the crawler visited before I made these changes, because those links still don’t use HTTPS. But I was exhausted after doing this much. At least I’ve taken care of the case where you’re reading newly-posted pages now.
I’ve felt embarrassed to state publicly exactly how old Comic Rocket’s software is by now. Still, I can’t easily explain just how much work I’m doing otherwise. So here are the details:
My friend Josh Triplett and I created the first version of this site, then called “Serialist”, in the Haskell programming language. (It seemed like a good idea in 2008.) In 2012 I teamed up with the Comic Rocket crew to rebrand Serialist, and I rewrote everything except the crawler using the Python programming language and the Django web framework. At that point the newest Django release was version 1.3.1.
If you’re wondering, the crawler is still mostly the same Haskell program that Josh and I wrote in 2008/2009. It just works really well! Granted, changing it is hard, such as for all the HTTPS work I described in the previous section. So I would do a lot of things differently if I were writing it now. But I’m proud of how durable it’s been!
Until a couple months ago, I was still using Django 1.3.1. There have been 15 significant Django releases since that version came out, along with many patch releases. A 16th release, Django 4.0, will probably be out soon. Each upgrade requires checking whether the parts I’ve written are compatible with the changes in that version of Django. Worse, I also have to check whether all the third-party software I’m using is compatible, and upgrade everything in lockstep.
By the way, this is why logging in with Facebook has been broken for years. Facebook changed things, and I couldn’t upgrade just the login part without upgrading everything else. Twitter login still works years later only because, apparently, Twitter engineers actually care about stability.
I’ve now reached Django 1.11, the eighth of those fifteen releases. That version is a major milestone because it was a “long-term support” release. The Django project continued providing security support for it until April 2020, and Comic Rocket now has every fix they released up to that point. And because its support lasted so long, I can now use pretty recent versions of everything else too.
I enjoyed the visualization video that I made of October’s progress. I don’t know if anybody else did, but I’m doing it again anyway.
The next big challenge isn’t really the next Django upgrade. My new problem is that the entire Python programming language also went through a major change in the last decade. Python 3 is superficially much like Python 2, but the differences are both subtle and important. It’s hard to be sure I’ve found every place in my software that needs to be adapted. And Django 1.11 was the last release to support Python 2, so I have to finish this transition before I can make more progress on that one.
As I get closer to the current versions of each piece of software, the upgrades have been getting easier. Those first few upgrades were a ton of work! I don’t know how painful this transition to Python 3 will turn out to be, but after that I expect the remaining upgrades to go pretty quickly. So I’m probably around the half-way mark in the upgrade process.
I’ll be giving an online talk about this effort at the beginning of February at the PyCascades 2022 conference. I hope I’ll have finished by then? 😅
Like last month, you can’t see most of what I’ve done this month. But I hope this post has helped you understand the scale of the invisible work I’ve been doing.
I’m writing these updates because I need help from Comic Rocket’s fans. I have a vision of how awesome Comic Rocket could be, and a plan for making that vision real. But I want you to understand how much work I have to do to get there. If you can support this work on Patreon, I’d really appreciate it!