Maybe try NAD+ boosters next? They seem to be reducing addictive behaviors quite a bit. Liposomal NAD+ or nicotinamide riboside or IV NAD+. The theory is an energetic deficit in the brain that drugs/addictions seem to override temporarily but deepen long-term and NAD+ is essentially bringing the energy back. Maybe GLPs do something similar due to flooding the body with broken down fat?
Strix Halo can only allocate 96GB RAM to the GPU. So GPT-OSS 120B can be ran only at Q6 at best (but activations would need to be partially stored in the CPU mem then).
I bet you're confusing VRAM (the old fixed thing) and GTT (dynamic) memory allocation. Linux amdgpu does GTT just fine. amdgpu_top is an example monitoring app that shows them separately.
That's not really true, the latest autoregressive image models create a codebook of patches that are then encoded as tokens and image is assembled out of them.
That won't work at elite schools like Stanford where a hard class average is like 98% and 94% will give you B+ due to the opposite curve being applied.
Blog author here. Actually, no. The model can be streamed into the DGX Spark, so we can run prefill of models much larger than 128GB (e.g. DeepSeek R1) on the DGX Spark. This feature is coming to EXO 1.0 which will be open-sourced soonTM.
M3 Ultra has slow GPU and no HW FP4 support so its initial token decoding is going to be slow, practically unusable for 100k+ context sizes. For token generation that is memory bound M3 Ultra would be much faster, but who wants to wait 15 minutes to read the context? Spark will be much faster for initial token processing, giving you a much better time to first token, but then 3x slower (273 vs 800GB/s) in token generation throughput. You need to decide what is more important for you. Strix Halo is IMO the worst of both worlds at the moment due to having the worst specs in both dimensions and the least mature software stack.
This is 100% the truth, and I am really puzzled to see people push Strix Halo so much for local inference. For about $1200 more you can just build a DDR5 + 5090 machine that will crush a Strix Halo with just about every MoE model (equal decode and 10-20x faster prefill for large, and huge gaps for any MoE that fits in 32GB VRAM). I'd have a lot more confidence in reselling a 5090 in the future than a Strix Halo machine, too.
reply