Some stuff on Topic Models-
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing (PLSI), created by Thomas Hofmann in 1999. Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSI developed by David Blei, Andrew Ng, and Michael Jordan in 2002, allowing documents to have a mixture of topics. Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics.
In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics. LDA is an example of a topic model
David M Blei’s page on Topic Models-
- a general introduction to topic modeling .
- At KDD-2011 a long tutorial about topic modeling. The slides are here .
- slides from a talk on dynamic and correlated topic models applied to the journal Science . (Here is a video of the talk.)
- a more technical review paper about this field.
- David Mimno maintains a bibliography of topic modeling papers and software.
The topic models mailing list is a good forum for discussing topic modeling.
Some resources I compiled on Slideshare based on the above-
I guess a topic model on topic model literature would be a fine example of a “meme”