Well, seems to be a quite particular application. There is currently no such module in SoSci Survey, because nobody has ever asked for, and probably there are few researchers who would actually use such a question type.
What SoSci Survey has is a module (with extra costs) that allows a continuous rating (RTR/CRM) of a video or audio file. However, I assume that would be too much for your application.
That, of course, does not mean that it is impossible with SoSci Survey. It means that you would have to write some JavaScript code, and implement a part of the question by yourself.
Technically, you will need two components. The audio file (should be easy, you may even use the default "drag mp3 into the questionnaire" solution in SoSci Survey). And then you need some JavaScript that listens to clicks or key strokes, and whenever such an event occurs, pools the timestamp from the audio/video and stores it into an internal variable. I would also recommend to give the respondents a bit feedback, but nothing that would be a challange, given some basic JavaScript knowledge.