基于K近邻的分类算法研究.docxVIP

下载本文档

60
0
约 28页
2016-12-18 发布于重庆
举报
版权申诉

基于K近邻的分类算法研究.docx

1、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

沈阳航空航天大学Shenyang Aerospace University算法分析题目：基于K-近邻分类算法的研究院系计算机学院专业计算机技术姓名学号指导教师 2015年 1 月摘要数据挖掘是机器学习领域内广泛研究的知识领域，是将人工智能技术和数据库技术紧密结合，让计算机帮助人们从庞大的数据中智能地、自动地提取出有价值的知识模式，以满足人们不同应用的需要。K 近邻算法（KNN）是基于统计的分类方法，是数据挖掘分类算法中比较常用的一种方法。该算法具有直观、无需先验统计知识、无师学习等特点，目前已经成为数据挖掘技术的理论和应用研究方法之一。本文主要研究了 K 近邻分类算法。首先简要地介绍了数据挖掘中的各种分类算法，详细地阐述了 K 近邻算法的基本原理和应用领域，其次指出了 K 近邻算法的计算速度慢、分类准确度不高的原因，提出了两种新的改进方法。针对 K 近邻算法的计算量大的缺陷，构建了聚类算法与 K 近邻算法相结合的一种方法。将聚类中的K -均值和分类中的 K 近邻算法有机结合。有效地提高了分类算法的速度。针对分类准确度的问题，提出了一种新的距离权重设定方法。传统的 KNN 算法一般采用欧式距离公式度量两样本间的距离。由于在实际样本数据集合中每一个属性对样本的贡献作用是不尽相同的，通常采用加权欧式距离公式。本文提出一种新的计算权重的方法。实验表明，本文提出的算法有效地提高了分类准确度。最后，在总结全文的基础上，指出了有待进一步研究的方向。关键词：K 近邻，聚类算法，权重，复杂度，准确度 ABSTRACTData mining is a widely field of machine learning, and it integrates the artificial intelligence technology and database technology. It helps people extract valuable knowledge from a large data intelligently and automatically to meet different people applications. KNN is a used method in data mining based on Statistic. The algorithm has become one of the ways in data mining theory and application because of intuitive, without priori statistical knowledge, and no study features. The main works of this thesis is k nearest neighbor classification algorithm. First, it introduces mainly classification algorithms of data mining and descripts theoretical base and application. This paper points out the reasons of slow and low accuracy and proposes two improved ways. In order to overcome the disadvantages of traditional KNN, this paper use two algorithms of classification and clustering to propose an improved KNN classification algorithm. Experiments show that this algorithm can speed up when it has a few effects in accuracy. According to the problem of classification accuracy, the paper proposes a new calculation of weight. KNN the traditional method generally used Continental distance formula measure the distance between the two samples. As the actual sample data c