A tiny mouse, a hacker.

See here for an introduction, and my link tree for socials.

  • 0 Posts
  • 18 Comments
Joined 2 years ago
cake
Cake day: December 24th, 2023

help-circle
  • NixOS, because:

    • I can have my entire system be declaratively configured, and not as a yaml soup bolted onto a random distro.
    • I can trivially separate the OS, and the data (thanks, impermanence)
    • it has a buttload of packages and integration modules
    • it is mostly reproducible

    All of these combined means my backups are simple (just snapshot /persist, with a few dirs excluded, and restic them to N places) and reliable. The systems all have that newly installed feel, because there is zero cruft accumulating.

    And with the declarative config being tangled out from a literate Org Roam garden, I have tremendous, and up to date documentation too. Declarative config + literate programming work really well together, and give me immense power.

    I use it on my desktop, in my homelab, and built and maintain a NixOS desktop for my wife and my mom, too.



  • A human using a browser feature/extension you personally disapprove of does not make them a bot

    So…? It is my site. If I see visitors engaging in behaviour I deem disrespectful or harmful, I’ll show them the boot, bot or human. If someone comes to my party, and starts behaving badly, I will kick them out. If someone shows up at work, and starts harassing people, they will be dealt with (hopefully!). If someone starts trying to DoS my services, I will block them.

    Blocking unwanted behaviour is normal. I don’t like anything AI near my stuff, so I will block them. If anyone thinks they’re entitled to my work regardless, that’s their problem, not mine. If they leave because my hard stance on AI, that’s a win.

    Once your content is inside my browser I have the right to disrespect it as I see fit.

    Then I have the right to tell you in advance to fuck off, and serve you garbage! Good, we’re on the same page then!


  • you disallow access to your website

    I do. Any legit visitor is free to roam around. I keep the baddies away, like if I were using a firewall. You do use a firewall, right?

    when the user agent is a little unusual

    Nope. I disallow them when the user agent is very obviously fake. Noone in 2025 is going to browse the web with “Firefox 3.8pre5”, or “Mozilla/4.0”, or a decade old Opera, or Microsoft Internet Explorer 5.0. None of those would be able to connect anyway, because they do not support modern TLS ciphers required. The rest are similarly unrealistic.

    nepenthes. make them regret it

    What do you think happens when a bad agent is caught by my rules? They end up in an infinite maze of garbage, much like the one generated by nepenthes. I use my own generator (iocaine), for reasons, but it is very similar to nepenthes. But… I’m puzzled now. Just a few lines above, you argued that I am disallowing access to my website, and now you’re telling me to use an infinite maze of garbage to serve them instead?

    That is precisely what I am doing.

    By the way, nepenthes/iocaine/etc alone does not do jack shit against these sketchy agents. I can guide them into the maze, but as long as they can access content outside of it, they’ll keep bombarding my backend, and will keep training on my work. There are two ways to stop them: passive identification, like my sketchy agents ruleset, or proof-of-work solutions like Anubis. Anubis has the huge downside that it is very disruptive to legit visitors. So I’m choosing the lesser evil.


  • This feature will fetch the page and summarize it locally. It’s not being used for training LLMs.

    And what do you think the local model is trained on?

    It’s practically like the user opened your website manually and skimmed the content

    It is not. A human visitor will skim through, and pick out the parts they’re interested in. A human visitor has intelligence. An AI model does not. An AI model has absolutely no clue what they user is looking for, and it is entirely possible (and frequent) that it discards the important bits, and dreams up some bullshit. Yes, even local ones. Yes, I tried, on my own sites. It was bad.

    It has value to a lot of people including me so it’s not garbage.

    If it does, please don’t come anywhere near my stuff. I don’t share my work only for an AI to throw away half of it and summarize it badly.

    But if you make it garbage intentionally then everyone will just believe your website is garbage and not click the link after reading the summary.

    If people who prefer AI summaries stop visiting, I’ll consider that as a win. I write for humans, not for bots. If someone doesn’t like my style, or finds me too verbose, then my content is not for them, simple as that. And that’s ok, too! I have no intention of appealing to everyone.


  • Pray tell, how am I making anyone’s browsing experience worse? I disallow LLM scrapers and AI agents. Human visitors are welcome. You can visit any of my sites with Firefox, even 139 Nightly, and it will Just Work Fine™. It will show garbage if you try to use an AI summary, but AI summaries are garbage anyway, so nothing of value is lost there.

    I’m all for a free and open internet, as long as my visitors act respectfully, and don’t try to DDoS me from a thousand IP addresses, trying to train on my work, without respecting the license. The LLM scrapers and AI agents do not respect my work, nor its license, so they get a nice dose of garbage. Coincidentally, this greatly reduces the load on my backend, so legit visitors can actually access what they seek. Banning LLM scrapers & AI bots improves the experience of my legit visitors, because my backend doesn’t crumble under the load.


  • Overboard? Because I disallow AI summaries?

    Or are you referring to my “try to detect sketchy user agents” ruleset? Because that had two false positives in the past two months, yet, those rules are responsible for stopping about 2.5 million requests per day, none of which were from a human (I’d know, human visitors have very different access patterns, even when they visit the maze).

    If the bots were behaving correctly, and respected my robots.txt, I wouldn’t need to fight them. But when they’re DDoSing my sites from literally thousands of IPs, generating millions of requests a day, I will go to extreme lengths to make them go away.





  • None, because they typicially open up a larger attack surface than the system would have without them. It’s been like that for a while now. For references, I’d recommend this article from Ars Technica, who reference some very knowledgeable people (including Chrome’s Security Chief at the time).

    There was a time when AV software was useful. We’re a decade past that, the world has changed, software has changed, defenses have changed, and AV software did not keep up.



  • What is stopping someone; say the FSF or some other group championing libre software from coming up with their own web engine completely different from the incumbent engines?

    Building a browser engine is hard, especially when the target is moving at a rapid pace, and that target is controlled by Google. Like it or not, the web as it is today, is pretty much driven by Google (and to a lesser extent by Apple and Microsoft) these days. They can throw infinite resources into developing the browser engine and the browser itself. The closest competitor we have today is likely Servo, and they scrape by on pennies.

    Developing something from scratch, with even less funding and expertise than Servo is a non-starter. It’s not going to happen. Sure, sure, there’s LadyBird and some other independent efforts, but I very highly doubt they’ll ever catch up to the three major engines.

    To develop and maintain a browser, you need people, and they need to be paid. Paying open source developers is… quite a big problem in and of itself, even for things considerably easier and smaller in scale than a web browser.

    surely if Web Devs tell them to go pound sand, or intentionally break the site when using Google Chrome, and put a message saying, “Go to Firefox / Safari for a better experience”, that will make Google backtrack.

    They would not, because for every developer who would do this, there’s 100 who would not, because their livelihood depends on people with Google browsers being able to use their stuff. Google is in a position of power here: they are the #1 search engine, they are the #1 browser, they’re pretty well positioned on the mobile phone market too. The vast majority of businesses (companies or individuals, doesn’t matter) simply can’t afford to go against Google.

    If the vast majority would, then yeah, Google would backtrack. But that would require a coordinated effort, from the vast majority of the internet. Likely multiple months of protest. That’s not going to happen, people can’t afford it.


  • It’s about 5 times longer than previous releases were maintained for, and is an experiment. If there’s a need for a longer term support branch, there will be one. It’s pointless to start maintaining an 5+ year branch with 0 users and a handful of volunteers, none of whom are paid for doing the maintenance.

    So yes, in that context, 15 months is long.


  • A lot of people do. Especially on GitHub, where you can just browse a random repository, find a file you want to change, hit the edit button, and edit it right there in the browser (it does the forking for you behind the scenes). For people unfamiliar with git, that’s huge.

    It’s also a great boon when you don’t want to clone the repo locally! For example, when I’m on a slow, metered connection, I have no desire to spend 10+ minutes (and half of my data cap) for a repo to clone, just so I can fix a typo. With the web editor, I can accomplish the same thing with very little network traffic, in about 1 minute.

    While normally I prefer the comfort of my Emacs, there are situations where a workflow that happens entirely in the browser is simply more practical.




  • Fair bias notice: I am a Forgejo contributor.

    I switched from Gitea to Forgejo when Forgejo was announced, and it was as simple as changing the binary/docker image. It remains that simple today, and will remain that simple for the foreseeable future, because Forgejo cherry picks most of the changes in Gitea on a weekly basis. Until the codebases diverge, that will remain the case, and Forgejo will remain a drop-in replacement until such time comes that we decide not to pick a feature or change. If you’re not reliant on said feature, it’s still a drop-in replacement. (So far, we have a few things that are implemented differently in Forgejo, but still in a compatible way).

    Let me offer a few reasons to switch:

    • Forgejo - as of today, and for the foreseeable future - includes everything in Gitea, but with more tests, and more features on top. A few features Forgejo has that Gitea does not:
      • Forgejo makes it possible to have any signed in user edit Wikis (like GitHub), Gitea restricts it to collaborators only. (Forgejo defaults to that too, but the default can be changed). Mind you, this is not in a Forgejo release yet, it will be coming in the next release probably in April.
      • Gitea has support for showing an Action status badge. Forgejo has badges for action statuses, stars, forks, issues, pull requests.
      • …there are numerous other features being developed for Forgejo that will not make it into Gitea unless they cherry pick it (they don’t do that), or reimplement it (wasting a lot of time, and potentially introducing bugs).
    • Forgejo puts a lot of effort into testing. Every feature developed for Forgejo needs to have a reasonable amount of tests. Most of the things we cherry pick for Gitea, we write tests for if they don’t have any (we write plenty of tests for stuff originating from Gitea).
    • Forgejo is developed in the open, using free tools: we use Forgejo to host the code, issues and releases, Forgejo Actions for CI, and Weblate for translations. Gitea uses GitHub to host the code, issues and releases, uses GitHub CI, and CrowdIn for translations (all of them proprietary platforms).
    • Forgejo accepts contributions without requiring copyright assignment, Gitea does not.
    • Forgejo routinely cherry picks from Gitea, Gitea does not cherry pick from Forgejo (they do tend to reimplement things we’ve done, though, a huge waste of time if you ask me).
    • Forgejo isn’t going anywhere anytime soon, see the sustainability repo. There are people committed to working on it, there are people paid to work on it, and there’s a fairly healthy community around it already.