You can also correctly bias your sampling so that when selecting new training in...

		tensor 1 day ago \| parent \| context \| favorite \| on: EuroLLM: LLM made in Europe built to support all 2... You can also correctly bias your sampling so that when selecting new training instances each language is chosen equally. Generally the diversity of data is good, unless that data is "wrong" which, ironically, is probably most of the internet, but I digress.