Hacker Newsnew | past | comments | ask | show | jobs | submit | dax_'s commentslogin

This is just prompting an LLM and just dumping it on the site (which is clearly what is happening here, all the articles show the same signs of AI output, no human writing, no style, as far as I can tell).

If this is the level of care that goes into news articles, then we're doomed. What will ultimately happen is that AI summarizes AI articles, which got summarized from another AI article, which got summarized from another AI article, .. and after enough rewriting all facts will be gone from articles. I don't care to read this slop, and I'm shocked people are so readily accepting this new state of affairs.


That bugged me too, so I started looking at other articles - they all look AI generated to me. Whole website should be banned.


Normally I would agree, but I've seen this happen too often. Common sense be damned, just make the number look good.


My experience is exactly the opposite (company with more than 10k employees). Getting anything done in Azure takes me 10x as long, as all of Azure is managed by one team, and everything requires approvals, lots of bureaucracy. Also, as it turns out, it is extremely expensive. Per our guidelines everything needs to be isolated within company intranet (unless really required to be external), which often means we need premium tier services in Azure. These are really, really pricey sometimes.

On the other hand, if I request a virtual server, it takes less than a week, and I can work with it much more freely.


It's just as possible that they need to invest more and more for negligible improvements to model performance. These companies are burning through money at an astonishing rate.

And as the internet deteriorates due to AI slop, finding good training material will become increasingly difficult. It's already happening that incorrect AI generated information is being cited as source for new AI answers.


They are burning through money, but their revenue is scaling at a similar rate.

I'm sure most companies have understood the "AI outputs feeding AI's" incest issue for a while and have many methods to avoiding it. That's why so much has been put into synthetic data pipelines for years.


It's just one of those sites that focuses on one thing, and does that extremely well, without trying to extract as much money from its users as possible. Rare thing nowadays.


With Windows 10 going out of support soon, I suspect there will be an increase in Linux adoption. After all, why throw out perfectly good hardware because of an arbitrary rule that Microsoft made? For me, I know that I'll install Linux for some relatives.


GDPR doesn't stop personal data being stored. It handles whom it can be shared with, when it has to be deleted, and only collect as much data as required. Also gives transparency to the users about their data use.

And if I were to give over personal information to an AI company, then absolutely I'll prefer a company who actually complies with GDPR.


yea i mean. how would they know how to remove it from 'memory' since they have no way to know with 100% accuracy which parts of my chart are PII.


The cautious approach on their part would be to just delete the whole thing on any subject access deletion request.


yes if they aren't using that to train


As a metaphor (well, a simile) think of it like if they were providing you with an FTP server or cloud storage. It's your choice what, if any, personal data you put into the system, and your responsibility to manage it, not theirs.

As to what to do if you, with a customer's permission, put their PD (PII being an American term) into the system, and then get a request to delete it... I'm not sure, sorry I'm not an expert on LLMs. But it's your responsibility to not put the PD into the system unless you're confident that the company providing the services won't spread it around beyond your control, and your responsibility not to put it into the system unless you know how to manage it (including deleting it if and when required to) going forwards.

Hopefully somebody else can come along and fill in my gaps on the options there - perhaps it's as simple as telling it "please remove all traces of X from memory", I don't know.

edit: Of course, you could sign an agreement with an AI provider for them to be a "data controller", giving them responsibility for managing the data in a GDPR-compliant way, but I'm not aware of Mistral offering that option.

edit 2: Given my non-expertise on LLMs, and my experience dealing with GDPR issues, my personal feeling is that I wouldn't be comfortable using any LLM for processing PD that wasn't entirely under my control, privately hosted. If I had something I wanted to do that required using SOTA models and therefore needed to use inference provided by a company like Mistral, I'd want either myself or my colleagues to understand a hell of a lot more about the subject than I currently do before going down that road. Thankfully it's not something I've had to dig into so far.


Well if it continues like this, that's what will happen. And I dread that future.

Noone will care to share anything for free anymore, because it's AI companies profiting off their hard work. And no way to prevent that from happening, because these crawlers don't identify themselves.


I'm 99% sure I already saw a product launch on HN for precisely this idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: