Risk adjusted scoring is done in some areas where we have the data for this (healthcare associated infections and antibiotic usage). And this is a place where hospitals and doctors actively do want it to work, because there are financial penalties associated with it.
It's still a fairly hard problem. I've had several very clever data scientists on teams who have gone "Oh, this is just an X problem..." and then 9 months later they're still trying to get a model to perform better than "Just take the average".
It's still a fairly hard problem. I've had several very clever data scientists on teams who have gone "Oh, this is just an X problem..." and then 9 months later they're still trying to get a model to perform better than "Just take the average".