- 1、本文档共94页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
DataminingConceptsandteniques0
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * What Is the Problem with PAM? Pam is more robust than k-means in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. O(k(n-k)2 ) for each iteration where n is # of data,k is # of clusters Sampling-based method CLARA(Clustering LARge Applications) * CLARA (Clustering Large Applications) (1990) CLARA (Kaufmann and Rousseeuw in 1990) Built in statistical analysis packages, such as SPlus It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output Strength: deals with larger data sets than PAM Weakness: Efficiency depends on the sample size A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased * CLARANS (“Randomized” CLARA) (1994) CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han’94) Draws sample of neighbors dynamically The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, it starts with new randomly selected node in search for a new local optimum Advantages: More efficient and scalable than both PAM and CLARA Further improvement: Focusing techniques and spatial access structures (Ester et al.’95) * ROCK: Clustering Categorical Data ROCK: RObust Clustering using linKs S. Guha, R. Rastogi K. Shim, ICDE’99 Major ideas Use links to measure similarity/proximity Not distance-based Algorithm: sampling-based clustering Draw random sample Cluster with links Label data in disk Experiments Congressional voting, mushroom data * Similarity Measure in ROCK Traditional measures for categorical data may not work well, e.g., Jaccard coefficient Example: Two gro
您可能关注的文档
- Authorware课件制作实例教程第章课件中的动画效果.ppt
- ATTIRE英国婚纱礼服杂志年月号完整版杂志doc.ppt
- Australia澳大利亚风土人情英文版.ppt
- AutoCAD使用教程.ppt
- AutoCAD第三讲编辑命令.ppt
- AutoCAD实用教程免费下载.ppt
- AutoCAD机械绘图技巧完整版工科同学必备.ppt
- AXD调试工具的使用详解.ppt
- AutoCAD建筑绘图精解.ppt
- AutoCAD第四讲编辑命令.ppt
- 汽车4S店员工内训课件07奥迪4S店新员工培训.pdf
- 辽宁省第二届职业技能大赛(轨道车辆技术赛项)理论参考试题库(含答案).pdf
- 2024年第四届全国工业设计职业技能大赛决赛(包装设计师)理论考试题库(含答案).pdf
- 精品解析:2022年广西壮族自治区学业水平考试押题预测卷 (一)历史试题(原卷版).docx
- 2024版《立体构成》全套课件完整版.ppt
- 《机上应急医疗》习题及答案.docx
- 2024年儿童保健技能大赛理论考试题库500题(含答案).pdf
- 计算机整机装配调试员技能竞赛备考试题库(含答案).pdf
- 《机床电气控制与PLC》期末试卷-A卷及答案.doc
- C++程序设计教程课件-C++多态与虚函数课件.pdf
最近下载
- 2024年工商银行人工智能大模型白皮书.pdf
- 提质增效施工组织设计.docx
- 2024年下半年北京夏都妫川人力资源有限公司招聘食品药品安全监察员12人笔试备考试题及答案解析.docx
- 2023年中国石油大学(北京)克拉玛依校区数据科学与大数据技术专业《计算机网络》科目期末试卷B(有答案).docx VIP
- 2024新人教版一年级数学上册综合与实践单元数学游戏单元整体教学设计.pdf VIP
- 教师资格考试结构化面试100题(含答案).pdf
- JG-D02 环境监测仪技术规范书.doc
- 班组安全活动记录表.pdf
- 大数据技术在继电保护领域的研究与应用-电力信息与通信技术.pdf VIP
- 重庆市某办公楼土建工程施工图预算编制.docx
文档评论(0)