Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can also correctly bias your sampling so that when selecting new training instances each language is chosen equally. Generally the diversity of data is good, unless that data is "wrong" which, ironically, is probably most of the internet, but I digress.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: