- 1、本文档共10页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Diversifying Restricted Boltzmann Machine for Document Modeling
Diversifying Restricted Boltzmann Machine for Document
Modeling
Pengtao Xie
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
pengtaox@
Yuntian Deng
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
yuntiand@
Eric P. Xing
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
epxing@
ABSTRACT
Restricted Boltzmann Machine (RBM) has shown great ef-
fectiveness in document modeling. It utilizes hidden units
to discover the latent topics and can learn compact semantic
representations for documents which greatly facilitate doc-
ument retrieval, clustering and classification. The popular-
ity (or frequency) of topics in text corpora usually follow a
power-law distribution where a few dominant topics occur
very frequently while most topics (in the long-tail region)
have low probabilities. Due to this imbalance, RBM tends
to learn multiple redundant hidden units to best represent
dominant topics and ignore those in the long-tail region,
which renders the learned representations to be redundant
and non-informative. To solve this problem, we propose Di-
versified RBM (DRBM) which diversifies the hidden units,
to make them cover not only the dominant topics, but also
those in the long-tail region. We define a diversity metric
and use it as a regularizer to encourage the hidden units to
be diverse. Since the diversity metric is hard to optimize
directly, we instead optimize its lower bound and prove that
maximizing the lower bound with projected gradient ascent
can increase this diversity metric. Experiments on docu-
ment retrieval and clustering demonstrate that with diver-
sification, the document modeling power of DRBM can be
greatly improved.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications,
Data Mining
General Terms
Algorithms, Experiments
Keywords
Diversified Restricted Boltzmann Machine, Diversity, Power-
law Distribution, Document Modeling, Topic Modeling
Permis
您可能关注的文档
- Cressi Leonador 中文手册 - 潜客网整理.pdf
- Creme L.X - Intensive training 时光面霜.pdf
- CRISPR nature protocol.pdf
- Cretaceous extension of the Ganhang Tectonic Belt, southeast China.pdf
- Critical crack tip opening displacement of different strength concrete.pdf
- Creep–fatigue damage dissimilar metal welds of modified 9Cr–1Mo steel and 316L stainless steel.pdf
- Crime Data Mining-An Overview and Case study.pdf
- Critical lines in symmetry of mixture models and its application to component splitting.pdf
- Creep_0607蠕变.pdf
- Critical Dimension for Stable Self-Gravitating Stars in AdS.pdf
- 教科版(2017秋)科学二年级上册2.6 做一顶帽子 教学设计.docx
- 河北高频考点专训四 质量守恒定律的应用教学设计---2024-2025学年九年级化学人教版(2024)上册.docx
- 大单元教学【核心素养目标】6.3 24时计时法教学设计 人教版三年级下册.docx
- 河南省商城县李集中学2023-2024学年下学期九年级历史中考模拟八(讲评教学设计).docx
- 第18章 第25课时 正方形的性质2023-2024学年八年级下册数学课时分层作业教学设计( 人教版).docx
- Module 8 模块测试 教学设计 2024-2025学年英语外研版八年级上册.docx
- 2024-2025学年小学数学五年级下册浙教版教学设计合集.docx
- 2024-2025学年小学劳动四年级下册人民版《劳动》(2022)教学设计合集.docx
- 2024-2025学年小学数学三年级上册冀教版(2024)教学设计合集.docx
- 2024-2025学年高中生物学必修1《分子与细胞》人教版教学设计合集.docx
文档评论(0)