It works well for me. It's probably at least partially a dialect thing. I'm southern US, and I, in particular, tend to mumble occasionally. I find "shoot" to be the easiest since you don't really have to annunciate for it. It sometimes take a second try for me. I've almost exclusively gone to using voice to take pictures. It probably comes out closer to "shOOdt" with the way I talk.
"Cheese" is probably the next easiest for the phone to recognize. In my way of thinking, it seems like the sustained middle vowel sound is probably what it picks up best on. But that one sounds a bit "cheesy" if other people are around. And if you're taking a picture of people, then they're going to think you're wanting them to say "cheese". lol.
"CAP-ture", if enunciated properly, would be next on my list. "Smile", I would think is the most difficult for the phone to understand.
Of course, background noise doesn't help. But I've actually had it work pretty decent with background noise going on.