Hacker Newsnew | past | comments | ask | show | jobs | submit | montebicyclelo's commentslogin

    bulk_send(
        generate_expiry_email(user) 
        for user in db.getUsers() 
        if is_expired(user, date.now())
    )
(...Just another flavour of syntax to look at)

The nice thing with the Elixir example is that you can easily `tap()` to inspect how the data looks at any point in the pipeline. You can also easily insert steps into the pipeline, or reuse pipeline steps. And due to the way modules are usually organized, it would more realistically read like this, if we were in a BulkEmails module:

  Users.all()
  |> Enum.filter(&Users.is_expired?(&1, Date.utc_today()))
  |> Enum.map(&generate_expiry_email/1)
  |> tap(&IO.inspect(label: "Expiry Email"))
  |> Enum.reject(&is_nil/1)
  |> bulk_send()
The nice thing here is that we can easily log to the console, and also filter out nil expiry emails. In production code, `generate_expiry_email/1` would likely return a Result (a tuple of `{:ok, email}` or `{:error, reason}`), so we could complicate this a bit further and collect the errors to send to a logger, or to update some flag in the db.

It just becomes so easy to incrementally add functionality here.

---

Quick syntax reference for anyone reading:

- Pipelines apply the previous result as the first argument of the next function

- The `/1` after a function name indicates the arity, since Elixir supports multiple dispatch

- `&fun/1` expands to `fn arg -> fun(arg) end`

- `&fun(&1, "something")` expands to `fn arg -> fun(arg, "something") end`


Not sure I like how the binding works for user in this example, but tbh, I don't really have any better idea.

Writing custom monad syntax is definitely quite a nice benefit of functional languages IMO.


Incorrect Pytorch gradients with Apple MPS backend...

Yep this kind of thing can happen. I found and reported incorrect gradients for Apple's Metal-backed tensorflow conv2d in 2021 [1].

(Pretty sure I've seen incorrect gradients with another Pytorch backend, but that was a few years ago and I don't seem to have raised an issue to refer to... )

One might think this class of errors would be caught by a test suite. Autodiff can be tested quite comprehensively against numerical differentiation [2]. (Although this example is from a much simpler lib than Pytorch, so I could be missing something.)

[1] https://github.com/apple/tensorflow_macos/issues/230

[2] https://github.com/sradc/SmallPebble/blob/2cd915c4ba72bf2d92...


I’ve also found that some versions of torch get quite different inference results on MPS, ignoring gradient. See https://gist.github.com/gcr/4d8833bb63a85fc8ef1fd77de6622770

Yeah, luckily, you can unit tests these and fix them. They are not concurrency bugs (again, luckily).

BTW, numeric differentiation can only be tested very limitedly (due to algorithmic complexity when you doing big matrix). It is much easier / effective to test against multiple implementations.


You can easily test a gradient using only the forward pass by doing f(x+h) ~ f(x) + dot(g, h) for a random h

Agreed, and its larger context window is fantastic. My workflow:

- Convert the whole codebase into a string

- Paste it into Gemini

- Ask a question

People seem to be very taken with "agentic" approaches were the model selects a few files to look at, but I've found it very effective and convenient just to give the model the whole codebase, and then have a conversation with it, get it to output code, modify a file, etc.


I usually do that in a 2 step process. Instead of giving the full source code to the model, I will ask it to write a comprehensive, detailed, description of the architecture, intent, and details (including filenames) of the codebase to a Markdown file.

Then for each subsequent conversation I would ask the model to use this file as reference.

The overall idea is the same, but going through an intermediate file allows for manual amendments to the file in case the model consistently forgets some things, it also gives it a bit of an easier time to find information and reason about the codebase in a pre-summarized format.

It's sort of like giving a very rich metadata and index of the codebase to the model instead of dumping the raw data to it.


My special hack on top of what you suggested: Ask it to draw the whole codebase in graphviz compatible graphing markup language. There are various tools out there to render this as an SVG or whatever, to get an actual map of the system. Very helpful when diving in to a big new area.

You can use mermaid format instead of graphviz, then paste it into a markdown file and github will render it inline.

For anyone wondering how to quickly get your codebase into a good "Gemini" format, check out repomix. Very cool tool and unbelievably easy to get started with. Just type `npx repomix` and it'll go.

Also, use Google AI Studio, not the regular Gemini plan for the best results. You'll have more control over results.


> Convert the whole codebase into a string

When using the Gemini web app on a desktop system (could be different depending upon how you consume Gemini) if you select the + button in the bottom-left of the chat prompt area, select Import code, and then choose the "Upload folder" link at the bottom of the dialog that pops up, it'll pull up a file dialog letting you choose a directory and it will upload all the files in that directory and all subdirectories (recursively) and you can then prompt it on that code from there.

The upload process for average sized projects is, in my experience, close to instantaneous (obviously your mileage can vary if you have any sort of large asset/resource type files commingled with the code).

If your workflow already works then keep with it, but for projects with a pretty clean directory structure, uploading the code via the Import system is very straightforward and fast.

(Obvious disclaimer: Depending upon your employer, the code base in question, etc, uploading a full directory of code like this to Google or anyone else may not be kosher, be sure any copyright holders of the code are ok with you giving a "cloud" LLM access to the code, etc, etc)


Well I am not sure Gemini or any other LLMs respect `.gitignore` which can immediately make the context window jump over the maximum.

Tools like repomix[0] do this better, plus you can add your own extra exclusions on top. It also estimates token usage as a part of its output but I found it too optimistic i.e. it regularly says "40_000 tokens" but when uploading the resulting single XML file to Gemini it's actually f.ex. 55k - 65k tokens.

[0] https://github.com/yamadashy/repomix/


I agree. I use repomix with AI Studio extensively and never found anything (including the cli agents) that's close.

I sometimes upload codebases that are around 600k tokens and even those work.

Repomix also lets you create a config file so you can give it ignore/include patterns in addition to .gitignore.

It also tells you about the outlier files with exceptionally long content.


try codex and claude code - game changing ability to use CLI tools, edit/reorg multiple files, even interact with git.

Gemini cli is a thing that exists. Are you saying those specifically are better? Or CLIs are better?

OpenAI Codex currently seems quite a lot better than Gemini 2.5 and marginally better than Claude.

I'm using all three back-to-back via the VS Code plugins (which I believe are equivalent to the CLI tools).

I can live with either OpenAI Codex or Claude. Gemini 2.5 is useful but it is consistently not quite as good as the other two.

I agree that for non-Agentic coding tasks Gemini 2.5 is really good though.


Since I have only used Gemini Pro 2.5 (free) and Claude on the web (free) and I am thinking of subbing to one service or two, are you saying that:

- Gemini Pro 2.5 is better at feeding it more code and ask it to do a task (or more than one)? - ...but that GPT Codex and Claude Code are better at iterating on a project? - ...or something else?

I am looking to gauge my options. Will be grateful for your shared experience.


Codex and Claude are better than Gemini in all coding tasks I've tried.

At the "smart autocomplete" level the distinction isn't large but it gets bigger the more agentic you ask for.


Gemini CLI does all this too

I started using gemini like that as well, but with gemini cli. Point it at the direction and then converse with it about codebase. It's wonderful.

Idk though, I've seen many issues occur because of a longer context though. I mean it makes sense, given there are only so many attention heads, the longer the context the less chance attention will pick relevant tokens.

the cli tools really are way faster. You can use them the same way if you want you just dont have to copy paste stuff around all the time

On the contrary; now might be a good time to get an M1 Max laptop. A second hand one, ex-corporate, in good condition, with 64Gb RAM, is pretty good value, compared to new laptops at the same price. It's still a fantastic CPU.


That's what I did, bought a used one with 64GB and a dent in the back for ~$1k a year back or so. Some of the best money i've ever spent.


Where would one look for ex-corporate MacBook pros?


At your own risk — one place is ebay sellers with a large number of positive reviews, (and not much negative), who are selling lots of the same type of MacBook pros. My assumption is they've got a bunch of corporate laptops to sell off.

Honestly the only Apple Silicon e-waste has been their 8GB models. And even those are still perfectly good for most people so long as they use Safari rather than Chrome.


Does Safari use less RAM?


Data maybe somewhat dated and I haven’t measured it myself but,

“Per his findings, Chrome used 290MB of RAM per open tab, while Safari only used 12MB of RAM per open tab.”

https://www.macrumors.com/2021/02/20/chrome-safari-ram-test/


> nanochat is also inspired by modded-nanoGPT

Nice synergy here, the lineage is: Karpathy's nano-GPT -> Keller Jordan's modded-nanoGPT (a speedrun of training nanoGPT) -> NanoChat

modded-nanoGPT [1] is a great project, well worth checking out, it's all about massively speeding up the training of a small GPT model.

Notably it uses the author's Muon optimizer [2], rather than AdamW, (for the linear layers).

[1] https://github.com/KellerJordan/modded-nanogpt

[2] https://kellerjordan.github.io/posts/muon/


Muon was invented by Keller Jordan (and then optimized by others) for the sake of this speedrunning competition. Even though it was invented less than a year ago, it has already been widely adopted as SOTA for model training


This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).

Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.

(Source: am experienced speedrunner who's been in these circles for a decent amount of time)


I think it's good to bring up Bernstein & Newhouse as well as Yuchen Jin, Jiacheng You and the other speedrunners who helped iterate on Muon. But I think it's very fair to call Keller Jordan the main author of Muon of its current form. I'm also in the speedrunning community though maybe not as long as you have


sharing some useful resrources for learning Muon (since I'm also just catching up on it)

- https://x.com/leloykun/status/1846842883967692926

- https://www.yacinemahdid.com/p/muon-optimizer-explained-to-a...


This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

https://www.youtube.com/watch?v=bO5nvE289ec

I found the above video as a good introduction.


The most exciting thing about Muon for me is that it requires half the state of Adam while having either equivalent or better performance. That's amazing if you are VRAM limited! And just like Adam, you can also quantize it. I can get it to work relatively well as low as 4-bit, which essentially cuts down the memory requirements from full 32-bit Adam by a factor of 16x! (And by a factor of 4x vs 8-bit Adam).


I haven't heard of this before. Has Muon dethroned Adam and AdamW as the standard general purpose optimizer for deep learning?


It's for hidden layers and not for every parameter: From Keller's Muon github page:

"Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard AdamW."

And I just looked into this nanochat repo and it's also how it's used here.

https://github.com/karpathy/nanochat/blob/dd6ff9a1cc23b38ce6...


8xH100 is pretty wild for a single inference node.

Is this what production frontier LLMs are running inference with, or do they consume even more VRAM/compute?

At ~$8/hr, assuming a request takes 5 seconds to fulfill, you can service roughly 700ish requests. About $0.01 per request.

Is my math wrong?


This is the spec for a training node. The inference requires 80GB of VRAM, so significantly less compute.


The default model is ~0.5B params right?


As vessenes wrote, that‘s for training. But a H100 can also process many requests in parallel.


I did the same. I was sad to lose the comments, but the ads were awful and I don't particularly want someone elses ads / tracking on my hobby site. I switched to gisqus [1], which is powered by GitHub discussions, which seems to be working ok. (The site is hosted on GH pages so seems reasonable to also use GH discussions for the comments.)

[1] https://giscus.app/


> One of the simplest CRDT strategies is Last-Write-Wins (LWW):

> Each update gets a timestamp (physical or logical).

> When two devices write to the same field, the update with the latest timestamp wins.

Please also have a robust synced undo feature, so it's easy to undo the thing you don't want that gets written. Apps that sync often seem to be stingy about how much "undos" they store/sync (if any).


Editing on phone. Phone dies. Scrounge up tablet, edit again, only rethink some of the work. Hit save. Plug in cellphone. Cellphone turns on. Now what?

Depends on the granularity of updates. Did the last changes get sent immediately? Are they gated by a save button? Are they periodically pushed?

Some of those don’t need a durable undo, but the rest definitely benefit, and undo has other uses.


Or they save as soon as they launch, so no matter what local is newer and you'll lose whatever was on remote.


Yeah... Looks like can get about $1/hr for 10 small VMs, ($0.10 per VM).

So for $3000, that's 3000 hours, or 125 days, (if just wastefully leave them on all the time, instead of turning them on when needed).

Say you wanted to play around for a couple of hours, that's like.. $3.

(That's assuming there's no bonus for joining / free tier, too.)


The VMs quickly get expensive if you leave them running though.

The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.

If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.

That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.

Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense


You can rent a beefy vm with an H100 for $1.50 / hr

I regularly rent this for a few hours at a time for learning and prototyping


[flagged]


I'll take the H1/200s over a vehicle any day of the week


Are you comparing 10 VM with 1 shared core with a 144 core solution?


I stopped using apple's notes app with an ipad pen after it lost 20 minutes of my handwritten notes when trying to sync them. (Which fits the theme of apple losing people's stuff.)


I don't really get the syncing situation with apple. And it's really hard to tell when they've resolved bugs in one app or introduced new ones elsewhere.

The Safari reading list can't even sync properly between devices for me. Image Capture ("Keep Originals"??) or AirDrop is a little minimal for such a keystone part of the phone -> computer if you don't want to use Apple ecosystem after.. Let alone the other more complicated issues.


Deleting your data is next level privacy.


> Deleting your data is next level privacy.

Yes, but not before syncing it with (NSA)iCloud. /s


You should’ve put an airtag on them first.


Yes, two or three to make sure.


I'd love to know how they CRDT hand-written notes.


Sounds like they don't.


You presumably would process the pen inputs, not the resulting image produced by the handwriting. No different from how you handle conflicts in online gaming.


I think OP is suggesting that Apple / AMD / Intel do the work of integrating their NPUs into popular libraries like `llama.cpp`. Which might make sense. My impression is that by the time the vendors support a certain model with their NPUs the model is too old and nobody cares anyway. Whereas llama.cpp keeps up with the latest and greatest.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: