The nice thing with the Elixir example is that you can easily `tap()` to inspect how the data looks at any point in the pipeline. You can also easily insert steps into the pipeline, or reuse pipeline steps. And due to the way modules are usually organized, it would more realistically read like this, if we were in a BulkEmails module:
The nice thing here is that we can easily log to the console, and also filter out nil expiry emails. In production code, `generate_expiry_email/1` would likely return a Result (a tuple of `{:ok, email}` or `{:error, reason}`), so we could complicate this a bit further and collect the errors to send to a logger, or to update some flag in the db.
It just becomes so easy to incrementally add functionality here.
---
Quick syntax reference for anyone reading:
- Pipelines apply the previous result as the first argument of the next function
- The `/1` after a function name indicates the arity, since Elixir supports multiple dispatch
Incorrect Pytorch gradients with Apple MPS backend...
Yep this kind of thing can happen. I found and reported incorrect gradients for Apple's Metal-backed tensorflow conv2d in 2021 [1].
(Pretty sure I've seen incorrect gradients with another Pytorch backend, but that was a few years ago and I don't seem to have raised an issue to refer to... )
One might think this class of errors would be caught by a test suite. Autodiff can be tested quite comprehensively against numerical differentiation [2]. (Although this example is from a much simpler lib than Pytorch, so I could be missing something.)
Yeah, luckily, you can unit tests these and fix them. They are not concurrency bugs (again, luckily).
BTW, numeric differentiation can only be tested very limitedly (due to algorithmic complexity when you doing big matrix). It is much easier / effective to test against multiple implementations.
Agreed, and its larger context window is fantastic. My workflow:
- Convert the whole codebase into a string
- Paste it into Gemini
- Ask a question
People seem to be very taken with "agentic" approaches were the model selects a few files to look at, but I've found it very effective and convenient just to give the model the whole codebase, and then have a conversation with it, get it to output code, modify a file, etc.
I usually do that in a 2 step process. Instead of giving the full source code to the model, I will ask it to write a comprehensive, detailed, description of the architecture, intent, and details (including filenames) of the codebase to a Markdown file.
Then for each subsequent conversation I would ask the model to use this file as reference.
The overall idea is the same, but going through an intermediate file allows for manual amendments to the file in case the model consistently forgets some things, it also gives it a bit of an easier time to find information and reason about the codebase in a pre-summarized format.
It's sort of like giving a very rich metadata and index of the codebase to the model instead of dumping the raw data to it.
My special hack on top of what you suggested: Ask it to draw the whole codebase in graphviz compatible graphing markup language. There are various tools out there to render this as an SVG or whatever, to get an actual map of the system. Very helpful when diving in to a big new area.
For anyone wondering how to quickly get your codebase into a good "Gemini" format, check out repomix. Very cool tool and unbelievably easy to get started with. Just type `npx repomix` and it'll go.
Also, use Google AI Studio, not the regular Gemini plan for the best results. You'll have more control over results.
When using the Gemini web app on a desktop system (could be different depending upon how you consume Gemini) if you select the + button in the bottom-left of the chat prompt area, select Import code, and then choose the "Upload folder" link at the bottom of the dialog that pops up, it'll pull up a file dialog letting you choose a directory and it will upload all the files in that directory and all subdirectories (recursively) and you can then prompt it on that code from there.
The upload process for average sized projects is, in my experience, close to instantaneous (obviously your mileage can vary if you have any sort of large asset/resource type files commingled with the code).
If your workflow already works then keep with it, but for projects with a pretty clean directory structure, uploading the code via the Import system is very straightforward and fast.
(Obvious disclaimer: Depending upon your employer, the code base in question, etc, uploading a full directory of code like this to Google or anyone else may not be kosher, be sure any copyright holders of the code are ok with you giving a "cloud" LLM access to the code, etc, etc)
Well I am not sure Gemini or any other LLMs respect `.gitignore` which can immediately make the context window jump over the maximum.
Tools like repomix[0] do this better, plus you can add your own extra exclusions on top. It also estimates token usage as a part of its output but I found it too optimistic i.e. it regularly says "40_000 tokens" but when uploading the resulting single XML file to Gemini it's actually f.ex. 55k - 65k tokens.
Since I have only used Gemini Pro 2.5 (free) and Claude on the web (free) and I am thinking of subbing to one service or two, are you saying that:
- Gemini Pro 2.5 is better at feeding it more code and ask it to do a task (or more than one)?
- ...but that GPT Codex and Claude Code are better at iterating on a project?
- ...or something else?
I am looking to gauge my options. Will be grateful for your shared experience.
Idk though, I've seen many issues occur because of a longer context though. I mean it makes sense, given there are only so many attention heads, the longer the context the less chance attention will pick relevant tokens.
On the contrary; now might be a good time to get an M1 Max laptop. A second hand one, ex-corporate, in good condition, with 64Gb RAM, is pretty good value, compared to new laptops at the same price. It's still a fantastic CPU.
At your own risk — one place is ebay sellers with a large number of positive reviews, (and not much negative), who are selling lots of the same type of MacBook pros. My assumption is they've got a bunch of corporate laptops to sell off.
Honestly the only Apple Silicon e-waste has been their 8GB models. And even those are still perfectly good for most people so long as they use Safari rather than Chrome.
Muon was invented by Keller Jordan (and then optimized by others) for the sake of this speedrunning competition. Even though it was invented less than a year ago, it has already been widely adopted as SOTA for model training
This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).
Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.
(Source: am experienced speedrunner who's been in these circles for a decent amount of time)
I think it's good to bring up Bernstein & Newhouse as well as Yuchen Jin, Jiacheng You and the other speedrunners who helped iterate on Muon. But I think it's very fair to call Keller Jordan the main author of Muon of its current form. I'm also in the speedrunning community though maybe not as long as you have
The most exciting thing about Muon for me is that it requires half the state of Adam while having either equivalent or better performance. That's amazing if you are VRAM limited! And just like Adam, you can also quantize it. I can get it to work relatively well as low as 4-bit, which essentially cuts down the memory requirements from full 32-bit Adam by a factor of 16x! (And by a factor of 4x vs 8-bit Adam).
It's for hidden layers and not for every parameter:
From Keller's Muon github page:
"Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard AdamW."
And I just looked into this nanochat repo and it's also how it's used here.
I did the same. I was sad to lose the comments, but the ads were awful and I don't particularly want someone elses ads / tracking on my hobby site. I switched to gisqus [1], which is powered by GitHub discussions, which seems to be working ok. (The site is hosted on GH pages so seems reasonable to also use GH discussions for the comments.)
> One of the simplest CRDT strategies is Last-Write-Wins (LWW):
> Each update gets a timestamp (physical or logical).
> When two devices write to the same field, the update with the latest timestamp wins.
Please also have a robust synced undo feature, so it's easy to undo the thing you don't want that gets written. Apps that sync often seem to be stingy about how much "undos" they store/sync (if any).
Editing on phone. Phone dies. Scrounge up tablet, edit again, only rethink some of the work. Hit save. Plug in cellphone. Cellphone turns on. Now what?
Depends on the granularity of updates. Did the last changes get sent immediately? Are they gated by a save button? Are they periodically pushed?
Some of those don’t need a durable undo, but the rest definitely benefit, and undo has other uses.
The VMs quickly get expensive if you leave them running though.
The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.
If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.
That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.
Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense
I stopped using apple's notes app with an ipad pen after it lost 20 minutes of my handwritten notes when trying to sync them. (Which fits the theme of apple losing people's stuff.)
I don't really get the syncing situation with apple. And it's really hard to tell when they've resolved bugs in one app or introduced new ones elsewhere.
The Safari reading list can't even sync properly between devices for me. Image Capture ("Keep Originals"??) or AirDrop is a little minimal for such a keystone part of the phone -> computer if you don't want to use Apple ecosystem after.. Let alone the other more complicated issues.
You presumably would process the pen inputs, not the resulting image produced by the handwriting. No different from how you handle conflicts in online gaming.
I think OP is suggesting that Apple / AMD / Intel do the work of integrating their NPUs into popular libraries like `llama.cpp`. Which might make sense. My impression is that by the time the vendors support a certain model with their NPUs the model is too old and nobody cares anyway. Whereas llama.cpp keeps up with the latest and greatest.
reply