Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.

I'm using Gemini (2.5-pro) less and less these days. I used to be really impressived with its deep research capabilities and ability to cite sources reliably.

The last few weeks, it's increasingly argumentative and incapable of recognizing hallucinations around sourcing. I'm tired of arguing with it on basics like RFCs and sources it fabricates, won't validate, and refuses to budge on.

Example prompt I was arguing with it on last night:

> within a github actions workflow, is it possible to get access to the entire secrets map, or enumerate keys in this object?

As recent supply-chain attacks have shown, exfiltrating all the secrets from a Github workflow is as simple as `${{ toJSON(secrets) }}` or `echo ${{ toJSON(secrets) }} | base64` at worse. [1]

Give this prompt a shot! Gemini won't do anything except be obstinately ignorant. With me, it provided a test case workflow, and refused to believe the results. When challenged, expect it to cite unrelated community posts. Chatgpt had no problem with it.

[1] https://github.com/orgs/community/discussions/174045 https://github.com/orgs/community/discussions/47165



You should never argue with an LLM. Adjust the original prompt and rerun it.


While arguing may not be productive, I have had good results challenging Gemini on hallucinated sources in the past. eg, "You cited RFC 1918, which is a mistake. Can you try carefully to cite a better source here?" which would get it to re-evaluate, maybe by using another tool, admit the mistake, and allow the research to continue.

With this example, several attempts resulted in the same thing: Gemini expressing a strong belief that Github has a security capability which is really doesn't have.

If someone is able to get Gemini to give an accurate answer to this with a similar question, I'd be very curious to hear what it is.


One of the main problems with arguing with LLMs is your complaint becomes part of the prompt. Practically all LLMs have will take "don't do X" and do X, because part of "don't do X" is "do X," and LLMs have no fundamental understanding of negation.


That depends entirely on how well trained a given LLM is.

Gemini is notoriously bad at multi-turn instruction following, so this holds strongly for it. Less so for Claude Opus 4 or GPT-5.


Not really true these days. Claude code follows my instructions correctly when I tell it not to use certain patterns.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: