But, isn't improving tools and the LLM's integration with them improving the model?
Caveat that we don't fully understand how human intelligence works, but with humans it's generally true that skills are not static or siloed. Improving in one area can generate dividends in others. It's like how some professional football players improve their games by taking ballet lessons. Two very different skills, but the incorporation of one improves the other as well as the whole.
I would argue that narrowly focusing on LLM performance via benchmarks before tool use is incorporated is interesting, but not particularly relevant to whether they are transformative, or even useful, as products.
The question isn't whether it makes a difference, the question is whether the model you're working with / the platform you're working with it on already does that. All of the major commercial models have their own system prompts that are quite detailed, and then the interfaces for using the models typically also have their own system prompts (Cursor, Claude Code, Codex, Warp, etc).
It's highly likely that if you're working with one of the commercial models that has been tuned for code tasks, in one of the commercial platforms that is marketed to SWEs, that instructions similar to the effect of "you're an expert/experienced engineer" will already be part of the context window.
Even those of us who do program for a day job can end up in a situation where focus time is hard to come by. The more senior you are in an IC role, the more likely it is you have more demands on your time than there are hours in the day. I find that Claude (via Warp) has helped me accomplish things that otherwise wouldn't get done because I'm bouncing from meeting to meeting or answering the nth dm.
If by "our constitution" you mean the U.S. Constitution then no, it says nothing of the sort. The first article of the U.S. Constitution concerns the organization of the legislative branch. You may be referencing the Equal Protection and Due Process clauses, in the Fifth and Fourteenth amendments, but neither of those applies in this situation either since there are no laws or governmental actions at issue here, and random sites on the internet are not universally considered to be public accommodations. Even in the ADA context, the law isn't actually clear, since websites aren't specified anywhere in the text at the federal level and there's no SCOTUS precedent on point.
Some states are more stringent with their own disability regulations or state constitutions, but no state anywhere in the U.S. has a law that says every visitor to a website has to be treated equally.
You can assume it's the USA and that I'm just dead wrong, but the third word of my profile specifies where I'm from and you'd find that this Dutch constitution matches the comment's contents
Equal protection is indeed not the same as equal treatment. No, it really does say that everyone shall be treated equally so long as the circumstances are equal (gelijke behandeling in gelijke gevallen)
I didn't assume, that's why I started my comment with "if by what you mean." Good to know that you were referencing a different place, but it's unrealistic to expect people to delve into your account bio to understand what you intended by "our constitution," especially when the parent comment also contained no geographic or cultural references. Perhaps you know the parent commenter and know that they share your geography? If so, that would also have been helpful context.
As an aside, I'm curious by how that language in the Dutch constitution actually works in practice. Is it just a game of distinguishing between situations or people to excuse disparate conduct? It seems like it would be unworkable if interpreted literally.
If the original framing was too generous, the response is at least as ungenerous. Table saws aren't deterministic tools either, and anyone who has used one for more than a minute can tell you that getting it to consistently cut the straight line you want takes skill.
As with all uses of current AI (meaning generative AI LLMs) context is everything. I say this as a person who is both a lawyer and a software engineer. It is not surprising that the general purpose models wouldn't be great at writing a legal brief -- the training data likely doesn't contain much of the relevant case law because while it is theoretically publicly available, practicing attorneys universally use proprietary databases like Lexis and WestLaw to surface it. The alternative is spelunking through public court websites that look like they were designed in the 90s or even having to pay for case records like on PACER.
At the same time, even if you have access to proper context like if your model can engage with Lexis or WestLaw via tool-use, surfacing appropriate matches from caselaw requires more than just word/token matching. LLMs are statistical models that tend to reduce down to the most likely answer. But, typically, in the context of a legal brief, a lawyer isn't attempting to find the most likely answer or even the objectively correct answer, they are trying to find relevant precedent with which they can make an argument that supports the position they are trying to advance. An LLM by its nature can't do that without help.
Where you're right, then, is that law and software engineering have a lot in common when it comes to how effective baseline LLM models are. Where you're wrong is in calling them glorified auto-complete.
In the hands of a novice they will, yes, generate plausible but mostly incorrect or technically correct but unusable in some way answers. Properly configured with access to appropriate context in the hands of an expert who understands how to communicate what they want the tool to give them? Oh that's quite a different matter.
> As with all uses of current AI (meaning generative AI LLMs) context is everything.
But that's the whole point. You can't fit an entire legal database into the context, it's not big enough. The fact that you have to rely on "context is everything" as a cope is precisely why I'm calling them a glorified autocomplete.
I recommend the context7 MCP tool for this exact purpose. I've been trying to really push agents lately at work to see where they fall down and whether better context can fix it.
As a test recently I instructed an agent using Claude to create a new MCP server in Elixir based on some code I provided that was written in Python. I know that, relatively speaking, Python is over-represented in training data and Elixir is under-represented. So, when I asked the agent to begin by creating its plan, I told it to reference current Elixir/Phoenix/etc documentation using context7 and to search the web using Kagi Search MCP for best practices on implementing MCP servers in Elixir.
It was very interesting to watch how the initially generated plan evolved after using these tools and how after using the tools the model identified an SDK I wasn't even aware of that perfectly fit the purpose (Hermes-mcp).
reply