Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like they realized the value of user content for AI training too late, and are trying to hamfistedly lock down the golden egg goose for their upcoming IPO.


Pushshift.io used to host a complete repository for Reddit. I think there were archives on the internet archive as well. There are numerous torrents with terabytes of text content for training AIs. Perhaps they might lock it down going forward, but language training horse fled and was eaten by coyotes a decade ago.


More (and more recent) content will be need for further training of the models for them to stay competitive and up to date.


Adding .json to any url still gives you everything in json. This has to do with 3rd party clients not showing ads and making up the revenue lost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: