I actually did something like this at some point. I took all the high ranking items, tokenized them to extract features, and ran them through a bayesian classifier to do some filtering. I was just using whatever information was available on the front page and did not do any further analysis with the actual content.
The results were ok. Maybe with a bit more power it could be more useful but the results were still hit and miss and I didn't have a long term strategy for not filtering myself into a bubble other than continuously re-training the model.
The results were ok. Maybe with a bit more power it could be more useful but the results were still hit and miss and I didn't have a long term strategy for not filtering myself into a bubble other than continuously re-training the model.