Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/68230
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: Comparing methods for single paragraph similarity analysis
Author: Stone, B.
Dennis, S.
Kwantes, P.
Citation: Topics in Cognitive Science, 2011; 3(1):92-122
Publisher: John Wiley & Sons Inc
Issue Date: 2011
ISSN: 1756-8757
1756-8765
Statement of
Responsibility: 
Benjamin Stone, Simon Dennis, Peter J. Kwantes
Abstract: The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets—23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models’ similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs.
Keywords: Semantic models
Paragraph similarity
Corpus preprocessing
Corpus construction
Wikipedia corpora
Rights: Copyright © 2010 Cognitive Science Society, Inc.
DOI: 10.1111/j.1756-8765.2010.01108.x
Published version: http://dx.doi.org/10.1111/j.1756-8765.2010.01108.x
Appears in Collections:Aurora harvest 5
Psychology publications

Files in This Item:
File Description SizeFormat 
RA_hdl_68230.pdf
  Restricted Access
Restricted Access658.69 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.