To do that you'll need a way of generating similar whispers. This means simulating cover noise overlayed on the DeepSpeech training data.
Obviously this idea has the error that the training data is not whispers. So you'll also want to train something to classify data as whispers or not.