A guide to passing GPUs through to Proxmox, XCP-ng VMs

How does this analogy work at all? LoRA is chosen by the modifier to be low ranked to accommodate some desktop/workstation memory constraint, not because the other weights are “very hard” to modify if you happens to have the necessary compute and I/O. The development in LoRA is also largely directed by storage reduction (hence not too many layers modified) and preservation of the generalizability (since training generalizable models is hard). The Kronecker product versions, in particular, has been first developed in the context of federated learning, and not for desktop/workstation fine-tuning (also LoRA is fully capable of modifying all weights, it is rather a technique to do it in a correlated fashion to reduce the size of the gradient update). And much development of LoRA happened in the context of otherwise fully open datasets (e.g. LAION), that are just not manageable in desktop/workstation settings.

This narrow perspective of “source” is taking away the actual usefulness of compute/training here. Datasets from e.g. LAION to Common Crawl have been available for some time, along with training code (sometimes independently reproduced) for the Imagen diffusion model or GPT. It is only when e.g. GPT-J came along that somebody invested into the compute (including how to scale it to their specific cluster) that the result became useful.

ylai@lemmy.ml · edit-2 2 months ago

This is a very shallow analogy. Fine-tuning is rather the standard technical approach to reduce compute, even if you have access to the code and all training data. Hence there has always been a rich and established ecosystem for fine-tuning, regardless of “source.” Patching closed-source binaries is not the standard approach, since compilation is far less computational intensive than today’s large scale training.

Java byte codes are a far fetched example. JVM does assume a specific architecture that is particular to the CPU-dominant world when it was developed, and Java byte codes cannot be trivially executed (efficiently) on a GPU or FPGA, for instance.

And by the way, the issue of weight portability is far more relevant than the forced comparison to (simple) code can accomplish. Usually today’s large scale training code is very unique to a particular cluster (or TPU, WSE), as opposed to the resulting weight. Even if you got hold of somebody’s training code, you often have to reinvent the wheel to scale it to your own particular compute hardware, interconnect, I/O pipeline, etc… This is not commodity open source on your home PC or workstation.

ylai@lemmy.ml · 2 months ago

The situation is somewhat different and nuanced. With weights there are tools for fine-tuning, LoRA/LoHa, PEFT, etc., which presents a different situation as with binaries for programs. You can see that despite e.g. LLaMA being “compiled”, others can significantly use it to make models that surpass the previous iteration (see e.g. recently WizardLM 2 in relation to LLaMA 2). Weights are also to a much larger degree architecturally independent than binaries (you can usually cross train/inference on GPU, Google TPU, Cerebras WSE, etc. with the same weights).

ylai@lemmy.ml · 2 months ago

Open Source Initiative tries to define Open Source AI

ylai@lemmy.ml · 2 months ago

Germany's Sovereign Tech Fund Now Supporting FFmpeg

ylai@lemmy.ml · 2 months ago

AMD Aims For AMF Decode In FFmpeg, Questioned Over Vulkan Video Commitment

ylai@lemmy.ml · 2 months ago

iFixit hails replaceable LPCAMM2 laptop memory as a 'big deal'

ylai@lemmy.ml · 2 months ago

Compiling MS-DOS 4.0 from DOS 4.0, on a PS/2!

ylai@lemmy.ml · 2 months ago

Raspberry Pi adds more memory to the Compute Module 4S

ylai@lemmy.ml · 2 months ago

The hyper-clouds are open source's friends

ylai@lemmy.ml · 2 months ago

There is even a sentence in README.md that makes it explicit:

The source files in this repo are for historical reference and will be kept static, so please don’t send Pull Requests suggesting any modifications to the source files […]

ylai@lemmy.ml · 2 months ago

IBM Buys HashiCorp To Control The Alternative To Red Hat Kubernetes

ylai@lemmy.ml · 3 months ago

Raspberry Pi OS Now Shipping With Vulkan Support By Default

ylai@lemmy.ml · edit-2 3 months ago

The Xz Backdoor Highlights the Vulnerability of Open Source Software—and Its Strengths

ylai@lemmy.ml · 4 months ago

Open-Source 4WD AI Robot Kit Compatible with Raspberry Pi Models 4 and 5

ylai@lemmy.ml · 4 months ago

See https://knowyourmeme.com/memes/u-mad, in particular the “Due to the agitating nature of the phrase, it is often considered a form of trolling.”

ylai@lemmy.ml · edit-2 4 months ago

The “you mad bro” is found among internal Valve communication (Valve COO Scott Lynch to Erik Johnson and Newell, i.e. in the sense Johnson/Newell being “mad”, not Sweeney). It was particularly not sent out as a response to Sweeney. Another outlet already got tripped over this and had to make a correction: https://www.gamingonlinux.com/2024/03/valve-coo-on-epics-tim-sweeney-you-mad-bro-when-launching-the-epic-store/

This is not quite as sensational as some people are framing it.

ylai@lemmy.ml · 8 months ago

Yes. But one should also note that only a limited range of Intel GPU support SR-IOV.

ylai@lemmy.ml · 11 months ago

The novel bit of this project is actually the usage of GGML quantization from llama.cpp for Stable Diffusion, which can offer lower RAM usage and faster inference on CPU than all the previous CPU implementations without the benefit of low bit quantization, which was known to make CPU and low RAM LLaMA inference feasible.

The important long term implication is that people have been targeting the incorrectly sized Stable Diffusion model, if the goal is quality on commodity hardware (this includes GPU, too). For example, Stable Diffusion where Stability AI has gloated so much how it fits commodity hardware is slightly less than 1 billion parameters. The smallest LLaMA that people nowadays can happily run on commodity GPU or CPU is already 7 billion parameters. And even OpenAI’s DALL·E 2, which many called prohibitive because “you need a 48 GB GPU” (which is not true, with quantization), is just 3.5 billion parameters.

For additional context, Stable Diffusion using CPU has been done before, though with repurposed frameworks rather than a custom C++ project. Notably, there has been a Q-Diffusion paper (https://github.com/Xiuyu-Li/q-diffusion), but the result was obtained by simulating the quantization, and e.g. the GitHub repo not actually offer an implementation with actual speed-up.