More

storus · 2025-10-30T14:12:38 1761833558

Maybe try NAD+ boosters next? They seem to be reducing addictive behaviors quite a bit. Liposomal NAD+ or nicotinamide riboside or IV NAD+. The theory is an energetic deficit in the brain that drugs/addictions seem to override temporarily but deepen long-term and NAD+ is essentially bringing the energy back. Maybe GLPs do something similar due to flooding the body with broken down fat?

Cthulhu_ · 2025-10-30T14:51:25 1761835885

Do GLPs flood the body with broken down fat? I thought they just suppressed appetite and the like.

happyPersonR · 2025-10-30T15:44:50 1761839090

People overstate some of the secondary effects, but in a nutshell that’s more or less what they do.

jokowueu · 2025-10-30T14:26:18 1761834378

Metabolic dysfunction is the root of many diseases which addiction is one of them .

storus · 2025-10-27T09:46:56 1761558416

Strix Halo has awful token prefill speed. Only suitable for very small contexts.

storus · 2025-10-22T14:15:33 1761142533

Basic datacenter technicians will be the new astronauts, swapping burnt CPUs and failed hard drives in space.

storus · 2025-10-22T12:43:29 1761137009

Strix Halo can only allocate 96GB RAM to the GPU. So GPT-OSS 120B can be ran only at Q6 at best (but activations would need to be partially stored in the CPU mem then).

vid · 2025-10-22T13:45:22 1761140722

It can use only 96GB RAM on Windows, on Linux people have allocated up to 120GB. Here's one source: https://www.reddit.com/r/LocalLLaMA/comments/1nmlluu/comment...

ondra · 2025-10-22T12:51:54 1761137514

GPT-OSS 120B uses native 4 bit representation, so it fits fine.

yencabulator · 2025-10-25T19:40:15 1761421215

I bet you're confusing VRAM (the old fixed thing) and GTT (dynamic) memory allocation. Linux amdgpu does GTT just fine. amdgpu_top is an example monitoring app that shows them separately.

More: https://news.ycombinator.com/item?id=44859582

fluoridation · 2025-10-22T14:52:51 1761144771

>Strix Halo can only allocate 96GB RAM to the GPU.

Are you referring to exclusive or shared allocation? I think shared allocation allows using all available memory.

storus · 2025-10-20T17:34:15 1760981655

That's not really true, the latest autoregressive image models create a codebook of patches that are then encoded as tokens and image is assembled out of them.

storus · 2025-10-19T18:08:51 1760897331

That won't work at elite schools like Stanford where a hard class average is like 98% and 94% will give you B+ due to the opposite curve being applied.

m-ee · 2025-10-19T18:54:52 1760900092

I went to Stanford and that was absolutely not the case. I once got an A on a midterm with a 65%

storus · 2025-10-19T19:25:03 1760901903

What I mentioned was the case in some hard CS classes I took there.

storus · 2025-10-17T02:35:15 1760668515

Wouldn't this restrict memory to 128GB, wasting M3 Ultra potential?

alexandercheema · 2025-10-17T16:38:39 1760719119

Blog author here. Actually, no. The model can be streamed into the DGX Spark, so we can run prefill of models much larger than 128GB (e.g. DeepSeek R1) on the DGX Spark. This feature is coming to EXO 1.0 which will be open-sourced soonTM.

storus · 2025-10-17T20:49:50 1760734190

Excellent! Good luck!

storus · 2025-10-15T16:46:03 1760546763

M5 is supposed to support FP4 natively which would explain the speed up on Q4 quantized models (down from BF16).

storus · 2025-10-15T11:58:01 1760529481

DGX Spark is not for training, only for inference (FP4).

storus · 2025-10-15T11:47:23 1760528843

M3 Ultra has slow GPU and no HW FP4 support so its initial token decoding is going to be slow, practically unusable for 100k+ context sizes. For token generation that is memory bound M3 Ultra would be much faster, but who wants to wait 15 minutes to read the context? Spark will be much faster for initial token processing, giving you a much better time to first token, but then 3x slower (273 vs 800GB/s) in token generation throughput. You need to decide what is more important for you. Strix Halo is IMO the worst of both worlds at the moment due to having the worst specs in both dimensions and the least mature software stack.

EnPissant · 2025-10-16T22:45:49 1760654749

This is 100% the truth, and I am really puzzled to see people push Strix Halo so much for local inference. For about $1200 more you can just build a DDR5 + 5090 machine that will crush a Strix Halo with just about every MoE model (equal decode and 10-20x faster prefill for large, and huge gaps for any MoE that fits in 32GB VRAM). I'd have a lot more confidence in reselling a 5090 in the future than a Strix Halo machine, too.