• 6 Posts
  • 275 Comments
Joined 3 years ago
cake
Cake day: July 2nd, 2023

help-circle
  • I don’t currently have any sort of notebook. Instead, for general notes, I prefer A3-sized loose sheets of paper, since I don’t really want to use double the table surface to have both verso and recto in front of me, I don’t like writing on spiral or perfect bound notebooks, and I already catalog my papers into 3-ring binders.

    if I’m debugging something, and I’m putting silly print statements to quickly troubleshoot, should I document that?

    My read of the linked post is that each discrete action need not be recorded, but rather the thought process that leads to a series of action. Rather than “added a printf() in constructor”, the overall thrust of that line of investigation might be “checking the constructor for signs of malformed input parameters”.

    I don’t disagree with the practice of “printf debugging”, but unless you’re adding a printf between every single operative line in a library, there’s always going to be some internal thought that goes into where a print statement is placed, based on certain assumptions and along a specific line of inquiry. Having a record of your thoughts is, I think, the point that the author is making.

    That said, in lieu of a formal notebook, I do make frequent Git commits and fill in the commit message with my thoughts, at every important juncture (eg before compiling, right before logging off or going to lunch).




  • Admittedly, I haven’t finished reflashing my formerly-Meshtastic LoRA radios with MeshCore yet, so I haven’t been able to play around with it yet. Although both mesh technologies are decent sized near me, I was swayed to MeshCore because I started looking into how the mesh algorithm works for both. No extra license, since MeshCore supports roughly the same hardware as Meshtastic.

    And what I learned – esp from following the #meshtastic and #meshcore hashtags on Mastodon – is that Meshtastic has some awful flooding behavior to send messages. Having worked in computer networks, this is a recipe for limiting the max size and performance of the mesh. Whereas MeshCore has a more sensible routing protocol for passing messages along.

    My opinion is that mesh networking’s most important use-case should be reliability, since when everything else (eg fibre, cellular, landlines) stops working, people should be able to self organize and build a working communications system. This includes scenarios where people are sparsely spaced (eg hurricane disaster with people on rooftops awaiting rescue) but also extremely dense scenarios (eg a protest where the authorities intentionally shut off phone towers, or a Taylor Swift concert where data networks are completely congested). Meshtastic’s flooding would struggle in the latter scenario, to send a distress message away from the immediate vicinity. Whereas MeshCore would at least try to intelligently route through nodes that didn’t already receive the initial message.


  • I personally started learning microcontrollers using an Arduino dev kit, and then progressed towards compiling the code myself using GCC and loading it directly to the Atmel 328p (the microcontroller from the original Arduino dev kits).

    But nowadays, I would recommend the MSP430 dev kit (which has excellent documentation for its peripherals) or the STM32 dev kit (because it uses the ARM32 architecture, which is very popular in the embedded hardware industry, so would look good on your resume).

    Regarding userspace drivers, because these are outside of the kernel, such drivers are not kept in the repositories for the kernel. You won’t find any userspace drivers in the Linux or FreeBSD repos. Instead, such drivers are kept in their own repos, maintained separately, and often does unusual things that the kernel folks don’t want to maintain, until there is enough interest. For example, if you’ve developed an unproven VPN tunnel similar to Wireguard, you might face resistance to getting that into the Linux kernel. But you could write a userspace driver that implements your VPN tunnel, and others can use that driver without changing their kernel. If it gets popular enough, other developers might put the effort into getting it reimplemented as a mainline kernel driver.

    For userspace driver development, a VM running the specific OS is fine. For kernel driver development, I prefer to run the OS within QEMU, since that allows me to attach a debugger to the VM’s “hardware”, letting me do things like adding breakpoints wirhin my kernel driver.


  • Very interesting! Im no longer pursuing Meshtastic – I’m changing over my hardware to run MeshCore now – but this is quite a neat thing you’ve done here.

    As an aside, if you later want to have full networking connectivity (Layer 2) using the same style of encoding the data as messages, PPP is what could do that. If transported over Meshtastic, PPP could give you a standard IP network, and on top of that, you could use SSH to securely access your remote machine.

    It would probably be very slow, but PPP was also used for dial-up so it’s very accommodating. The limiting factor would be whether the Meshtastic local mesh would be jammed up from so many messages.


  • This answer is going to go in multiple directions.

    If you’re looking for practice on using C to implement ways to talk to devices and peripherals, the other commenter’s suggested to start with an SBC (eg Raspberry Pi, Orange Pi) or with a microcontroller dev kit (eg Arduino, MSP430, STM32) is spot-on. That gives you a bunch of attached peripherals, the datasheet that documents the register behavior, and so you can then write your own C functions that fill in and read those registers. In actual projects, you would probably use the provided libraries that already do this, but there is educational value in trying it yourself.

    However, just because you write a C function named “put_char_uart0()”, that isn’t enough to prepare for writing full-fledged drivers, such as those in the Linux and FreeBSD kernel. This next step is more about software design, where you structure your C code so that rather than being very hardware-specific (eg for the exact UART peripheral in your microcontroller) you have code which works for a more generic UART (which abstracts general details) but is common-code to all the UARTs made by the same manufacturer. This is about creating reusable code, about creating abstraction layers, and about writing extensible code. Not all code can be reusable, not every abstraction layer is desirable, and you don’t necessarily want to make your code super extensive if it starts to impact your core requirements. Good driver design means you don’t ever paint yourself into a corner, and the best way to learn how to avoid this is through sheer experience.

    For when you do want to write a full-and-proper driver for any particular peripheral – maybe one day you’ll create one such device, such as by using an FPGA attached via PCIe to a desktop computer – then you’ll need to work within an existing driver framework. Linux and FreeBSD drivers use a framework so that all drivers have access to what they need (system memory, I/O, helper functions, threads, etc), and then it’s up to the driver author to implement the specific behavior (known in software engineering as “business logic”). It is a learned skill – also through experience – to work within the Linux or FreeBSD kernels. So much so that both kernels have gone through great lengths to enable userspace drivers, meaning the business logic runs as a normal program on the computer, saving the developer from having to learn the strange ways of kernel development.

    And it’s not like user space drivers are “cheating” in any way: they’re simply another framework to write a device driver, and it’s incumbent on the software engineer to learn when a kernel or user space driver is more appropriate for a given situation. I have seen kernel drivers used for sheer computational performance, but have also seen userspace drivers that were developed because nobody on that team was comfortable with kernel debugging. Those are entirely valid reasons, and software engineering is very much about selecting the right tool from a large toolbox.



  • I’ll take a stab at the question. But I’ll need to lay some foundational background information.

    When an adversarial network is blocking connections to the Signal servers, the Signal app will not function. Outbound messages will still be encrypted, but they can’t be delivered to their intended destination. The remedy is to use a proxy, which is a server that isn’t blocked by the adversarial network and which will act as a relay, forwarding all packets to the Signal servers. The proxy cannot decrypt any of the messages, and a malicious proxy is no worse than blocking access to the Signal servers directly. A Signal proxy specifically forwards only to/from the Signal servers; this is not an open proxy.

    The Signal TLS Proxy repo contains a Docker Compose file, which will launch Nginx as a reverse proxy. When a Signal app connects to the proxy at port 80 or 443, the proxy will – in the background – open a connection to the Signal servers. That’s basically all it does. They ostensibly wrote the proxy as a Docker Compose file, because that’s fairly easy to set up for most people.

    But now, in your situation, you already have a reverse proxy for your selfhosting stack. While you could run Signal’s reverse proxy in the background and then have your main reverse proxy forward to that one, it would make more sense to configure your main reverse proxy to directly do what the Signal reverse proxy would do.

    That is, when your main proxy sees one of the dozen subdomains for the Signal server, it should perform reverse proxying to those subdomains. Normally, for the rest of your self hosting arrangement, the reverse proxy would target some container that is running on your LAN. But in this specific case, the target is actually out on the public Internet. So the original connection comes in from the Internet, and the target is somewhere out there too. Your reverse proxy simply is a relay station.

    There is nothing particularly special about Signal choosing to use Nginx in reverse proxy mode, in that repo. But it happens to be that you are already using Nginx Proxy Manager. So it’s reasonable to try porting Signal’s configuration file so that it runs natively with your Nginx Proxy Manager.

    What happens if Signal updates that repo to include a new subdomain? Well, you wouldn’t receive that update unless you specifically check for it. And then update your proxy configuration. So that’s one downside.

    But seeing as the Signal app demands port 80 and 443, and you already use those ports for your reverse proxy, there is no way to avoid programming your reverse proxy to know the dozen subdomains. Your main reverse proxy cannot send the packets to the Signal reverse proxy if your main proxy cannot even identify that traffic.



  • There can be, although some parts may still need to be written in assembly (which is imperative, because that’s ultimately what most CPUs do), for parts like a kernel’s context switching logic. But C has similar restrictions, like how it is impossible to start a C function without initializing the stack. Exception: some CPUs (eg Cortex M) have a specialized mechanism to initialize the stack.

    As for why C, it’s a low-level language that maps well to most CPU’s native assembly language. If instead we had stack-based CPUs – eg Lisp Machines or a real Java Machine – then we’d probably be using other languages to write an OS for those systems.


  • The other commenters correctly opined that encryption at rest should mean you could avoid encryption in memory.

    But I wanted to expand on this:

    I really don’t see a way around this, to make the string searchable the hashing needs to be predictable.

    I mean, there are probabilistic data structures, where something like a Bloom filter will produce one of two answers: definitely in the set, or possibly in the set. In the context of search tokens, if you had a Bloom filter, you could quickly assess if a message does not contain a search keyword, or if it might contain the keyword.

    A suitably sized Bloom filter – possibly different lengths based on the associated message size – would provide search coverage for that message, at least until you have to actually access and decrypt the message to fully search it. But it’s certainly a valid technique to get a quick, cursory result.

    Though I think perhaps just having the messages in memory unencrypted would be easier, so long as that’s not part of the attack space.


  • Upvoting because the FAQ genuinely is worthwhile to read, and answers the question I had in mind:

    7.9 Why not just use a subset of HTTP and HTML?

    I don’t agree with their answer though, since if the rough, overall Gemini experience:

    is roughly equivalent to HTTP where the only request method is “GET”, the only request header is “Host” and the only response header is “Content-type”, plus HTML where the only tags are <p>, <pre>, <a>, <h1> through <h3>, <ul> and <li> and <blockquote>

    Then it stands to reason – per https://xkcd.com/927/ – to do exactly that, rather than devise new protocol, client, and server software. Instead, some of their points have few or no legs to stand on.

    The problem is that deciding upon a strictly limited subset of HTTP and HTML, slapping a label on it and calling it a day would do almost nothing to create a clearly demarcated space where people can go to consume only that kind of content in only that kind of way.

    Initially, my reply was going to make a comparison to the impossibility of judging a book by its cover, since that’s what users already do when faced with visiting a sketchy looking URL. But I actually think their assertion is a strawman, because no one has suggested that we should immediately stop right after such a protocol has been decided. Very clearly, the Gemini project also has client software, to go with their protocol.

    But the challenge of identifying a space is, quite frankly, still a problem with no general solution. Yes, sure, here on the Fediverse, we also have the ActivityPub protocol which necessarily constrains what interactions can exist, in the same way that ATProto also constrains what can exist. But even the most set-in-stone protocol (eg DICT) can be used in new and interesting ways, so I find it deeply flawed that they believe they have categorically enumerated all possible ways to use the Gemini protocol. The implication is that users will never be surprised in future about what the protocol enables, and that just sounds ahistoric.

    It’s very tedious to verify that a website claiming to use only the subset actually does, as many of the features we want to avoid are invisible (but not harmless!) to the user.

    I’m failing how to see how this pans out, because seeing as the web is predominantly client-side (barring server side tracking of IP address, etc), it should be fairly obvious when a non-subset website is doing something that the subset protocol does not allow. Even if it’s a lay-in-wait function, why would a subset-compliant client software honor that?

    When it becomes obvious that a website is not compliant with the subset, a well-behaved client should stop interacting with the website, because it has violated the protocol and cannot be trusted going forward. Add it to an internal list of do-not-connect and inform the user.

    It’s difficult or even impossible to deactivate support for all the unwanted features in mainstream browsers, so if somebody breaks the rules you’ll pay the consequences.

    And yet, Firefox forks are spawning left and right due to Mozilla’s AI ambitions.

    Ok, that’s a bit blithe, but I do recognize that the web engines within browsers are now incredibly complex. Even still though, the idea that we cannot extricate the unneeded sections of a rendering engine and leave behind the functionality needed to display a subset of HTML via HTTP, I just can’t accept that until someone shows why that is the case.

    Complexity begats complexity, whereas this would be an exercise in removing complexity. It should be easier than writing new code for a new protocol.

    Writing a dumbed down web browser which gracefully ignores all the unwanted features is much harder than writing a Gemini client from scratch.

    Once again, don’t do that! If a subset browser finds even one violation of the subset protocol, it should halt. That server is being malicious. Why would any client try to continue?

    The error handling of a privacy-respecting protocol that is a subset of HTML and HTTP would – in almost all cases – assume the server is malicious, and to disconnect. It is a betrayal of the highest order. There is no such thing as a “graceful” betrayal, so we don’t try to handle that situation.

    Even if you did it, you’d have a very difficult time discovering the minuscule fraction of websites it could render.

    Is this about using the subset browser to look at regular port-80 web servers? Or is this about content discovery? Only the latter has a semblance of logic behind it, but that too is an unsolved problem to this day.

    Famously, YouTube and Spotify are drivers of content discovery, based in part due to algorithms that optimize for keeping users on those platforms. Whereas the Fediverse eschews centralized algorithms and instead just doesn’t have one. And in spite of that, people find communities. They find people, hashtags, images, and media. Is it probably slower than if an algorithm could find these for the user’s convenience? Yes, very likely.

    But that’s the rub: no one knows what they don’t know. They cannot discover what they don’t even imagine could exist. That remains the case, whether the Gemini protocol is there or not. So I’m still not seeing why this is a disadvantage against an HTTP/HTML subset.

    Alternative, simple-by-design protocols like Gopher and Gemini create alternative, simple-by-design spaces with obvious boundaries and hard restrictions.

    ActivityPub does the same, but is constructed atop HTTP, while being extensible to like-for-like replace any existing social media platform that exists today – and some we haven’t even thought of yet – while also creating hard and obvious boundaries which forment a unique community unlike any other social media platform.

    The assertion that only simple protocols can foster community spaces is belied by ActivityPub’s success; ActivityPub is not exactly a simple protocol either. And this does not address why stripping down HTML/HTTP wouldn’t also do the same.

    You can do all this with a client you wrote yourself, so you know you can trust it.

    I sure as heck do not trust the TFTP client I wrote at uni, and that didn’t even have an encryption layer. The idea that every user will write their own encryption layer to implement the mandatory encryption for Gemini protocol is farcical.

    It’s a very different, much more liberating and much more empowering experience than trying to carve out a tiny, invisible sub-sub-sub-sub-space of the web.

    So too would browsing a sunset of HTML/HTTP using a browser that only implements that subset. We know this because if your reading this right now, you’re either viewing this comment through a web browser frontend for Lemmy, or using an ActivityPub client of some description. And it is liberating! Here we all are, on this sub sub sub sub space of the Internet, hanging out and commenting about protocols and design.

    But that doesn’t mean we can’t adapt already-proven, well-defined protocols into a subset that matches an earlier vision of the internet, while achieving the same.


  • companies are purposely designing them wrong to shorten their service life

    This 100%. And specifically for readers unfamiliar with how product R&D works, the malice doesn’t even have to metastasize throughout a whole company in order to design inferior products. The following summarized, fictional exchange should depict the problem:

    Management: we see Competitor X released a new light bulb that lasts 800 hours and costs $1. We need our own light bulb product, with at least 40% gross margin.

    Marketing: OK, we can be competitive if we make a 1000 hour light bulb and consumers are willing to buy it for $1.10. We can maintain 40% gross margin if our cost per unit is less than 25 cents.

    Engineering: OK, we’ll go work on that

    [3 months later]

    Engineering: right, we’ve built this light bulb that lasts 1500 hours avg (std dev of 100 hours) and only uses bog-standard tungsten from our long-term supplier, so the cost is 20 cents per unit

    Marketing: nice, but we don’t need 1500 hours. Can we reduce the cost per unit further?

    Engineering: What? But we’re already below 25 cents.

    Marketing: No, you see, management wants at least 40% gross margin. More margin, more profit, more better.

    Engineering: No, you don’t see. We’re already using the thinnest tungsten possible. We can’t change an element’s melting point, we can’t draw it any narrower, we can’t do … [insert ten other reasons why pursuing further savings is of diminishing return]

    Marketing: This is an ultimatum: we cannot accept this product into production unless it meets exactly the specification we wrote. We will cancel the project and outsource R&D if you cannot achieve this.

    Engineering: WTF??

    [a month later]

    Engineering: Per your insane request, we have produced this light bulb with 1000 hours (std dev of 400 hours), by taking the earlier design and knicking the tungsten filament every few millimeters, so that those thin points will eventually fatigue and break. The cost to do this knicking is an extra 1 cent, but the material savings is 2 cents, so we are now at 19 cents per unit. But the std deviation shows wildly varying behavior for when any particular bulb will fail. /exasperated sigh

    Marketing: Excellent! We’ll ship it!

    Engineering: …

    When the incentive structure is bad, all sorts of perverse results will occur, even for well-meaning participants. I leave it to you, dear reader, whether R&D’s complicity with such perverse, capitalistic goals is morally damning, but the fact remains that if a company does develop a superior product, it might just never see the light of day, or will be intentionally delayed/deferred until needed for “competitive” reasons. This is one possible mode, but the other would be to chronically underfund R&D, so that superior products cannot possibly be developed, barring a spontaneous and unplanned stroke of genius.

    To that end, whichever the cause, the result is all the same: the guise of “competition” is actually a local minima, where the thread-bare minimum is accepted as a maximum, where maximum profit can be extracted. Why put money into innovation if no one else is? Why compete when all the other competitors know the game as well: all will save costs by maintaining the mediocrity. It’s like monopolies and trusts from a hundred years ago, but they don’t even need to say a word to each other in order to collude.


  • The full-blown solution would be to have your own recursive DNS server on your local network, and to block or redirect any other DNS server to your own, and possibly blocking all know DoH servers.

    This would solve the DNS leakage issue, since your recursive server would learn the authoritative NS for your domain, and so would contact that NS directly when processing any queries for any of your subdomains. This cuts out the possibility of any espionage by your ISP/Google/Quad9’s DNS servers, because they’re now uninvolved. That said, your ISP could still spy in the raw traffic to the authoritative NS, but from your experiment, they don’t seem to be doing that.

    Is a recursive DNS server at home a tad extreme? I used to think so, but we now have people running Pi-hole and similar software, which can run in recursive mode (being built atop Unbound, the DNS server software).

    /<minor nitpick>

    “It was DNS” typically means that name resolution failed or did not propagate per its specification. Whereas I’m of the opinion that if DNS is working as expected, then it’s hard to pin the blame on DNS. For example, forgetting to renew a domain is not a DNS problem. And setting a bad TTL or a bad record is not a DNS problem (but may be a problem with your DNS software). And so too do I think that DNS leakage is not a DNS problem, because the protocol itself is functioning as documented.

    It’s just that the operators of the upstream servers see dollar-signs by selling their user’s data. Not DNS, but rather a capitalism problem, IMO.

    /</minor nitpick>




  • I loaded True Nas onto the internal SSD and swapped out the HDD drive that came with it for a 10tb drive.

    Do I understand that you currently have a SATA SSD and a 10TB SATA HDD plugged into this machine?

    If so, it seems like a SATA power splitter that divides the power to the SSD would suffice, in spite of the computer store’s admonition. The reason for splitting power from the SSD is because an SSD draws much less power than spinning rust.

    Can it still go wrong? Yes, but that’s the inherent risk when pushing beyond the design criteria of what this machine was originally built for. That said, “going wrong” typically means “won’t turn on”, not “halt and catch fire”.




  • IMO, circular buffers with two advancing pointers are an awesome data structure for high performance compute. They’re used in virtualized network hardware (see virtio) and minimizing Linux syscalls (see io_uring). Each ring implements a single producer, single consumer queue, so two rings are usually used for bidirectional data transfer.

    It’s kinda obscure because the need for asynchronous-transfer queues doesn’t show up that often unless dealing with hardware or crossing outside of a single CPU. But it’s becoming relevant due to coprocessors (ie small ARM CPUs attached to a main CPU) that process offloaded requests and then quickly return the result when ready.