Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/115071
Citations
Scopus Web of Science® Altmetric
?
?
Full metadata record
DC FieldValueLanguage
dc.contributor.authorWu, Q.-
dc.contributor.authorShen, C.-
dc.contributor.authorWang, P.-
dc.contributor.authorDick, A.-
dc.contributor.authorvan den Hengel, A.-
dc.date.issued2018-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2018; 40(6):1367-1381-
dc.identifier.issn0162-8828-
dc.identifier.issn1939-3539-
dc.identifier.urihttp://hdl.handle.net/2440/115071-
dc.description.abstractMuch of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.-
dc.description.statementofresponsibilityQi Wu, Chunhua Shen, Peng Wang, Anthony Dick, and Anton van den Hengel-
dc.language.isoen-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.rights© 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.-
dc.source.urihttp://dx.doi.org/10.1109/tpami.2017.2708709-
dc.subjectImage captioning; visual question answering; concepts learning; recurrent neural networks; LSTM-
dc.titleImage captioning and visual question answering based on attributes and external knowledge-
dc.typeJournal article-
dc.identifier.doi10.1109/TPAMI.2017.2708709-
dc.relation.granthttp://purl.org/au-research/grants/arc/FT120100969-
pubs.publication-statusPublished-
dc.identifier.orcidWu, Q. [0000-0003-3631-256X]-
dc.identifier.orcidDick, A. [0000-0001-9049-7345]-
dc.identifier.orcidvan den Hengel, A. [0000-0003-3027-8364]-
Appears in Collections:Aurora harvest 3
Computer Science publications

Files in This Item:
File Description SizeFormat 
hdl_115071.pdfAccepted version46.77 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.