I'm guessing there's something about how they're doing recognition of who is speaking that would make it hard to scale the cloud based speech recognition. Though I also notice that Alexa can answer more questions with the network down than it used to. Early on it was pretty much only the wake words, and then it'd given an error no matter what. Now it "listens" and will answer some requests to some degree without a network connection (e.g. if you ask to set an alarm it'll tell you the network is down and it can't set new alarms, but that it will still alert you, so it'd seem the full smarts of parsing an alarm request goes to the cloud, but it recognises at least enough to know you've asked about an alarm)