Back before structured outputs were common among model providers, I used to have a “end result” tool the model could call to get the structured response I was looking for. It worked very reliably.
It’s a bit of a hack but maybe that reliably works here?
You can definitely build an agent and have it use tools like you mention. That’s the equivalent of making 2 requests to Gemini, one to get the initial answer/content, then another to get it formatted as proper json
The issue here is that Gemini has support for some internal tools (like search and web scraping), and when you ask the model to use those, you can’t also ask it to use application/json as the output (which you normally can when not using tools)
I think this might be also something to do with their super specific outputting requirements when you do use search (has to be displayed in predefined Google format).
It’s a bit of a hack but maybe that reliably works here?