Multi-modal cycle-consistent generalized zero-shot learning

Felix, R.; Vijay Kumar, B.; Reid, I.; Carneiro, G.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/116545

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Multi-modal cycle-consistent generalized zero-shot learning
Author:	Felix, R. Vijay Kumar, B. Reid, I. Carneiro, G.
Citation:	Lecture Notes in Artificial Intelligence, 2018 / Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (ed./s), vol.11210 LNCS, pp.21-37
Publisher:	Springer
Issue Date:	2018
Series/Report no.:	Lecture Notes in Computer Science
ISBN:	9783030012304
ISSN:	0302-9743 1611-3349
Conference Name:	15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich)
Editor:	Ferrari, V. Hebert, M. Sminchisescu, C. Weiss, Y.
Statement of Responsibility:	Rafael Felix, B. G. Vijay Kumar, Ian Reid, and Gustavo Carneiro
Abstract:	In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen testing visual representations into one of the seen classes’ semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial networks (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features - the synthesized representations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classification accuracy, but there is one important missing constraint: there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This missing constraint can result in synthetic visual representations that do not represent well their semantic features, which means that the use of this constraint can improve GAN-based approaches. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantic compatibility, we can then synthesize more representative visual representations for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.
Keywords:	Generalized zero-shot learning; generative adversarial networks; cycle consistency loss
Rights:	© Springer Nature, Switzerland AG 2018
DOI:	10.1007/978-3-030-01231-1_2
Grant ID:	http://purl.org/au-research/grants/arc/CE140100016 http://purl.org/au-research/grants/arc/FL130100102 http://purl.org/au-research/grants/arc/DP180103232
Published version:	http://dx.doi.org/10.1007/978-3-030-01231-1_2
Appears in Collections:	Aurora harvest 3 Australian Institute for Machine Learning publications Computer Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship