Qualititive results of our dual-trained VQA and VQG models

Visual Question Generation as Dual Task of Visual Question Answering


we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks, our model has a better understanding of the interactions among images, questions and answers. Evaluated on the CLEVR and VQA2 datasets, our iQAN could improve the top-1 accuracy of the baseline MUTAN VQA method by 1.33% and 0.88%. We also show that our proposed dual training framework can consistently improve model performances on many popular VQA architectures.

Spotlight In 2018 IEEE Conference on Computer Vision and Pattern Recognition 2018.

If you have any further inquries, please contact Yikang.