Latent Dirichlet allocation LDA topic modeling in javascript see http en wikipedia org wiki Latent_Dirichlet_allocation Implementation References Based on original javascript implementation https github com awaisathar lda js NPM Library https www npmjs com package lda script src https cdnjs cloudflare com ajax libs jquery 3 6 0 jquery min js script script src https cdn rawgit com awaisathar lda js gh pages js stopwords js script script src https cdn rawgit com awaisathar lda js gh pages js lda js script script Example document var text Cats are small Dogs are big Cats like to chase mice Dogs like to eat bones Extract sentences and builds vocab var documents new Array var sentences text var f var vocab new Array var docCount 0 for var i 0 i sentences length i if sentences i continue var words sentences i split s if words continue var wordIndices new Array for var wc 0 wc words length wc var w words wc toLowerCase replace a z A Z0 9 g TODO Add stemming if w w length 1 stopwords w w indexOf http 0 continue if f w f w f w 1 else if w f w 1 vocab push w wordIndices push vocab indexOf w if wordIndices wordIndices length 0 documents docCount wordIndices console log vocab vocab console log documents documents Run LDA to get terms for 2 topics var V vocab length var M documents length var K 2 number of topics var alpha 0 1 per document distributions over topics var beta 01 per topic distributions over words lda configure documents V 10000 2000 100 10 lda gibbs K alpha beta var theta lda getTheta var phi lda getPhi console log number of topics phi length build topic word lists let topics topics var topTerms 20 max topics var topicText new Array for var k 0 k phi length k var tuples new Array for var w 0 w phi k length w tuples push phi k w toPrecision 2 _ vocab w tuples sort reverse if topTerms vocab length topTerms vocab length for var t 0 t topTerms t var topicTerm tuples t split _ 1 var prob parseInt tuples t split _ 0 100 if prob 0 0001 continue console log topic k topicTerm prob if topics k undefined topics k topics k push topicTerm console log topics topics script Shows example of a circle hierarchy using d3 style circle fill rgb 31 119 180 fill opacity 25 stroke rgb 31 119 180 stroke width 1px leaf circle fill ff7f0e fill opacity 1 text font 10px sans serif style script src https d3js org d3 v6 min js script script var w 300 var h 300 var pack d3 pack size w 4 h 4 padding 2 var svg d3 select body append svg attr width w attr height h var json name flare children name analytics children name cluster children name AgglomerativeCluster size 3938 name CommunityStructure size 3812 name HierarchicalCluster size 6714 name MergeEdge size 743 var root d3 hierarchy json sum function d return d size sort function a b return b size a size let g svg append g var node g selectAll node data pack root descendants enter append g attr class function d return d children node leaf node attr transform function d return translate d x d y node append circle attr r function d return d r node filter function d return d children append text attr dy 0 3em text function d return d data name substring 0 d r 3 console log done document body style height 600px script script src https cdnjs cloudflare com ajax libs jquery 3 6 0 jquery min js script script src https cdn rawgit com awaisathar lda js gh pages js stopwords js script script src https cdn rawgit com awaisathar lda js gh pages js lda js script script src https d3js org d3 v6 min js script style circle fill rgb 31 119 180 fill opacity 25 stroke rgb 31 119 180 stroke width 1px leaf circle fill ff7f0e fill opacity 1 text font 10px sans serif style script Example document var text Cats are small Dogs are big Cats like to chase mice Dogs like to eat bones Extract sentences and builds vocab var documents new Array var sentences text var f var vocab new Array var docCount 0 for var i 0 i sentences length i if sentences i continue var words sentences i split s if words continue var wordIndices new Array for var wc 0 wc words length wc var w words wc toLowerCase replace a z A Z0 9 g TODO Add stemming if w w length 1 stopwords w w indexOf http 0 continue if f w f w f w 1 else if w f w 1 vocab push w wordIndices push vocab indexOf w if wordIndices wordIndices length 0 documents docCount wordIndices console log vocab vocab console log documents documents Run LDA to get terms for 2 topics var V vocab length var M documents length var K 2 number of topics var alpha 0 1 per document distributions over topics var beta 01 per topic distributions over words lda configure documents V 10000 2000 100 10 lda gibbs K alpha beta var theta lda getTheta var phi lda getPhi console log number of topics phi length build topic word lists let topics topics var topTerms 20 max topics var topicText new Array for var k 0 k phi length k var tuples new Array for var w 0 w phi k length w tuples push phi k w toPrecision 2 _ vocab w tuples sort reverse if topTerms vocab length topTerms vocab length for var t 0 t topTerms t var topicTerm tuples t split _ 1 var prob parseInt tuples t split _ 0 100 if prob 0 0001 continue console log topic k topicTerm prob if topics k undefined topics k name topicTerm children else topics k children push name topicTerm size prob 100 console log topics topics let json name root children topics you could save out the processed data load it for visualizing later e g JSON stringify var w 300 var h 300 var pack d3 pack size w 4 h 4 padding 2 var svg d3 select body append svg attr width w attr height h var root d3 hierarchy json sum function d return d size sort function a b return b size a size let g svg append g var node g selectAll node data pack root descendants enter append g attr class function d return d children node leaf node attr transform function d return translate d x d y node append circle attr r function d return d r node filter function d return d children append text attr dy 0 3em text function d return d data name substring 0 d r 3 console log ready document body style height 600px script
node filter function d return d children append text attr dy 0 3em text function d return d data name substring 0 d r 3 console log done document body style height 600px script script src https cdnjs cloudflare com ajax libs jquery 3 6 0 jquery min js script script src https cdn rawgit com awaisathar lda js gh pages js stopwords js script script src https cdn rawgit com awaisathar lda js gh pages js lda js script script src https d3js org d3 v6 min js script style circle fill rgb 31 119 180 fill opacity 25 stroke rgb 31 119 180 stroke width 1px leaf circle fill ff7f0e fill opacity 1 text font 10px sans serif style script Example document var text Cats are small Dogs are big Cats like to chase mice Dogs like to eat bones Extract sentences and builds vocab var documents new Array var sentences text var f var vocab new Array var docCount 0 for var i 0 i sentences length i if sentences i continue var words sentences i split s if words continue var wordIndices new Array for var wc 0 wc words length wc var w words wc toLowerCase replace a z A Z0 9 g TODO Add stemming if w w length 1 stopwords w w indexOf http 0 continue if f w f w f w 1 else if w f w 1 vocab push w wordIndices push vocab indexOf w if wordIndices wordIndices length 0 documents docCount wordIndices console log vocab vocab console log documents documents Run LDA to get terms for 2 topics var V vocab length var M documents length var K 2 number of topics var alpha 0 1 per document distributions over topics var beta 01 per topic distributions over words lda configure documents V 10000 2000 100 10 lda gibbs K alpha beta var theta lda getTheta var phi lda getPhi console log number of topics phi length build topic word lists let topics topics var topTerms 20 max topics var topicText new Array for var k 0 k phi length k var tuples new Array for var w 0 w phi k length w tuples push phi k w toPrecision 2 _ vocab w tuples sort reverse if topTerms vocab length topTerms vocab length for var t 0 t topTerms t var topicTerm tuples t split _ 1 var prob parseInt tuples t split _ 0 100 if prob 0 0001 continue console log topic k topicTerm prob if topics k undefined topics k name topicTerm children else topics k children push name topicTerm size prob 100 console log topics topics let json name root children topics you could save out the processed data load it for visualizing later e g JSON stringify var w 300 var h 300 var pack d3 pack size w 4 h 4 padding 2 var svg d3 select body append svg attr width w attr height h var root d3 hierarchy json sum function d return d size sort function a b return b size a size let g svg append g var node g selectAll node data pack root descendants enter append g attr class function d return d children node leaf node attr transform function d return translate d x d y node append circle attr r function d return d r node filter function d return d children append text attr dy 0 3em text function d return d data name substring 0 d r 3 console log ready document body style height 600px script