Or you can do TTS to audio files, and then play one file whilst preparing the next. This would reduce the inter-batch delay for TTS processing to a small (possibly zero) and constant duration.
Also, if you break on paragraphs rather than sentences, it might be easier to find suitable break/pause points, since most text formats have a newline at the end of a paragraph and/or a blank line between paragraphs.
I have a vague recollection of somebody doing that with a book reader or training program, but can't find the thread.