I made a small Markov Chain joke generator during my coffee break sometime last week. This is in continuation to the last post, where we did a similar thing. I did this specifically to see how well it could be extended in a language which I have typically not used before for ML/NLP.
Let me run you guys through it.
First of all, the Markhov Chains need a bunch of data to tell it how exactly you want your sentences constructed.
str_arr=[sentence1, sentence2,...]
Next, we create a dictionary of all trigrams present across the sentences. To do this, we use all bigrams as keys, and the succeeding word as the corresponding values. The key-value pairs thus form a trigram.
As an example, consider the sentence : “The man had a dog.”
The dictionary for this sentence will have :[ {[The, man] : [had]}, {[man, had] : [a]}, {[had, a] : [dog]} ]
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trigram_dict = {}; | |
str_arr.forEach( function(e,i){ | |
sent = e; | |
word_tokens = str.split(" "); | |
word_tokens.forEach( | |
function(e,i){ | |
if( i < 1 ){ return true; } | |
if( i == ( word_tokens.length - 1 ) ){ return true; } | |
// check if key (previous-word current-word) present | |
if( !trigram_dict[ word_tokens[i-1]+' '+e] ){ | |
trigram_dict[ word_tokens[i-1]+' '+e] = [] ; | |
} | |
//check if next-word available | |
if( typeof(word_tokens[i+1]) != "undefined" ){ | |
trigram_dict[word_tokens[i-1]+' '+e].push(word_tokens[i+1] ); | |
} | |
} | |
); | |
} ); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function create(trigram_dict ){ | |
new_string_arr = ["Yo","mama"]; | |
prevword=trigram_dict [ new_string_arr[new_string_arr.length-2]; | |
pprevword=new_string_arr[new_string_arr.length-1] ; | |
while( typeof( trigram_dict [prevword+' '+pprevword] ) != "undefined" ){ | |
candidate_words = trigram_dict [ prevword+' '+pprevword ] ; | |
//select word randomly out of all candidates | |
item = candidate_words[Math.floor( | |
Math.random() * candidate_words.length)]; | |
new_string_arr.push(item); | |
prevword=trigram_dict [ new_string_arr[new_string_arr.length-2]; | |
pprevword=new_string_arr[new_string_arr.length-1] ; | |
} | |
new_str = new_string_arr.join(" "); | |
//A brand new sentence | |
return new_str ; | |
} |
Comments
Post a Comment