Yup. The difference is particularly apparent with o3, which does bursts of web searches on its own whenever it feels it'll be helpful in solving a problem, and uses the results to inform its own next steps (as opposed to just picking out parts to quote in a reply).
(It works surprisingly well, and feels mid-way between Perplexity's search and OpenAI's Deep Research.)
I asked "What version/model are you running, atm" (I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE).
People often say "I asked ChatGPT something and it was wrong", and then you ask them the model and they say "huh?"
The default model is 4.1o-mini, which is much worse than 4.1o and much much worse than o3 at many tasks.