- 1、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
学术论文相似度评估的研究
本科学生毕业论文 学术论文相似度评估的研究 黑 龙 江 工 程 学 院 二八年六月 The Graduation Thesis for Bachelors Degree Research on the Similarity Assessment of Academic Paper Heilongjiang Institute of Technology 2008-06·Harbin 摘 要 互联网的发展为人们共享信息提供了前所未有的条件,然而,这也为的。目前,国内尚没有相应的技术手段来解决这一问题,针对这一,本文提出并实现了论文系统 本系统采用n元文法结构作为索引单元,相同的切分方式来处理查询和。本文利用开源的Lemur Tookit作为检索系统基础,以概率模型Okapi 为检索模型,以BM25为相似度计算公式。在得到的检索结果反馈中如果有论文的权重值明显高于其他各文档,则判定存在抄袭行为该篇论文即为被抄袭论文,并将其返回给用户做进一步评判。 实验表明,相似度评估系统完全是可行的。本系统的运用会降低当前论文评审者们的劳动量,提高评审效率,同时也在很大程度上保护了论文作者的合法权益。它的发展势必为广大用户带来更高效快捷的服务。 关键词:信息检索;n元文法;索引;Lemur相似度 ABSTRACT The development of the Internet for people to share information provided unprecedented conditions, however, for some peoples acts of plagiarism has facilitated. So we have to the existence of an academic paper to judge the phenomenon of plagiarism. At present, China is still not the appropriate technical means to resolve this problem, for this point, this paper and to achieve the academic papers similarity assessment system, by detecting the similarity between the papers to be judged by whether there is the phenomenon of plagiarism. The system uses n-grammar structure as the index unit design basis, with the same segmentation approach to search for documents and question the document. By using open source Lemur Tookit retrieval system as a basis for probability model for the retrieval Okapi model to BM25 similarity formula for the establishment of systems. With the feedback in the search results if a paper weight was significantly higher than that of other documents, determine the existence of acts of plagiarism is the papers were copied papers, and returned to the user to do further evaluation. The experiments show that the similarity assessment system is entirely feasible. This system will reduce the use of the current academic papers to have the assessment of labor, increase efficiency assessment, and also to a large extent to protect the legitimate rights and interests of the authors thesis. Its developme
文档评论(0)