Nowadays, boosted by the progress in computer vision and natural language processing, the cross-discipline research, including image captioning, visual question answering, visual storytelling, visual relation detection, draws increasingly more attention from both areas. In this lecture, a brief introduction about recent advances in these areas is presented. We will also present our works about visual relationship detection and visual question answering.