Augmenting speech transcripts of VR recordings with gaze, pointing, and visual context for multimodal coreference resolution
A system that augments VR speech transcripts with textual descriptions of ambiguous references to objects (such as "it" or "there").
Read More
