Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on

  • 0 Posts
Joined 4 months ago
Cake day: March 3rd, 2024

  • Especially because seeing the same information in different contexts helps mapping the links between the different contexts and helps dispel incorrect assumptions.

    Yes, but this is exactly the point of deduplication - you don’t want identical inputs, you want variety. If you want the AI to understand the concept of cats you don’t keep showing it the same picture of a cat over and over, all that tells it is that you want exactly that picture. You show it a whole bunch of different pictures whose only commonality is that there’s a cat in it, and then the AI can figure out what “cat” means.

    They need to fundamentally change big parts of how learning happens and how the algorithm learns to fix this conflict.

    Why do you think this?

  • There actually isn’t a downside to de-duplicating data sets, overfitting is simply a flaw. Generative models aren’t supposed to “memorize” stuff - if you really want a copy of an existing picture there are far easier and more reliable ways to accomplish that than giant GPU server farms. These models don’t derive any benefit from drilling on the same subset of data over and over. It makes them less creative.

    I want to normalize the notion that copyright isn’t an all-powerful fundamental law of physics like so many people seem to assume these days, and if I can get big companies like Meta to throw their resources behind me in that argument then all the better.

  • Remember when piracy communities thought that the media companies were wrong to sue switch manufacturers because of that?

    It baffles me that there’s such an anti-AI sentiment going around that it would cause even folks here to go “you know, maybe those litigious copyright cartels had the right idea after all.”

    We should be cheering that we’ve got Meta on the side of fair use for once.

    look up sample recover attacks.

    Look up “overfitting.” It’s a flaw in generative AI training that modern AI trainers have done a great deal to resolve, and even in the cases of overfitting it’s not all of the training data that gets “memorized.” Only the stuff that got hammered into the AI thousands of times in error.

  • One thing that might be nice is if there could be a standard for user IDs that would allow multiple systems to work seamlessly together.

    You could have Mastodon continue to focus solely on being a completely open media aggregator and social network, but also have some other completely independent and secure private messaging system that uses the same user ID system. Then if you want to send a private message to someone who’s made a Mastodon post you can use that and it “just works.”

    Creating a universal user ID system that would work across all of this is challenging, of course.

  • Even with that, being absolutist about this sort of thing is wrong. People undergoing surgery have spent time on heart/lung machines that breathe for them. People sometimes fast for good reasons, or get IV fluids or nutrients provided to them. You don’t see protestors outside of hospitals decrying how humans aren’t meant to be kept alive with such things, though, at least not in most cases (as always there are exceptions, the Terri Schiavo case for example).

    If I want to create an AI substitute for myself it is not anyone’s right to tell me I can’t because they don’t think I was meant to do that.

  • One of the important features of Mastodon is that you can choose what your feed is. Everyone’s feed has an algorithm determining what’s in it even if it’s just a simple “list the posts of everyone I’ve subscribed to in chronological order.”

    If someone else wants to see a feed of content that is curated and sorted in a different way, why get angry at them? They’re not forcing you to see that feed.

  • It sounds like they weren’t “being fed into an AI model” as in being used as training material, they were just being evaluated by an AI model. However…

    Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

    Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

    It sounds like Maven wants to play nice, but if the “general attitude” means that playing nice is impossible why should they even bother to try?

  • Looks like it.

    In addition to pulling in posts, the import process seems to be running AI sentiment analysis to add tags and relational data after content reaches Maven’s servers. This is a core part of Maven’s product: instead of follows or likes, a model trains itself on its own data in an attempt to surface unique content algorithmically.

    But of course, that news doesn’t give the reader those lovely rage endorphins or draw clicks.

    This is the Fediverse, having the content we post get spread around to other servers is the whole point of all this. Is this a face-eating leopard situation? People are genuinely surprised and upset that the stuff we post here is ending up being shown in other places?

    There is one thing I see here that raises my eyebrows:

    Even more shocking is the revelation that somehow, even private DMs from Mastodon were mirrored on their public site and searchable. How this is even possible is beyond me, as DM’s are ostensibly only between two parties, and the message itself was sent from two users.

    But that sounds to me like a problem, it shouldn’t be sending out private DMs to begin with.