I prefer to start subtitles about 300 ms before the line is spoken, and let them linger long enough for everyone to read them without the effort (that would be no more than 21 characters per second for the English-speaking audience). Though my experience is only with subs for translated dialogue, I’m not sure how well it applies to music videos.
no subject