I made a small Markov Chain joke generator during my coffee break sometime last week. This is in continuation to the last post, where we did a similar thing. I did this specifically to see how well it could be extended in a language which I have typically not used before for ML/NLP.
Let me run you guys through it.
First of all, the Markhov Chains need a bunch of data to tell it how exactly you want your sentences constructed.
str_arr=[sentence1, sentence2,...]
Next, we create a dictionary of all trigrams present across the sentences. To do this, we use all bigrams as keys, and the succeeding word as the corresponding values. The key-value pairs thus form a trigram.
As an example, consider the sentence : “The man had a dog.”
The dictionary for this sentence will have :[ {[The, man] : [had]}, {[man, had] : [a]}, {[had, a] : [dog]} ]
Next up, using the dictionary that we just made to create sentences. Here we provide the first two words, and let the function work its magic to complete the sentence. The first two words are used as key to search the dictionary for a candidate third word, which is appended to the first two words. Then the second and third words are taken as key, and so on. If there are multiple words as succession candidates for a particular pair, any one of them becomes the Chosen One randomly. The process continues until no succeeding word is found, and the words collected till then form our new sentence.
That’s it! Some observations I would like to make here: One could try to extend the trigrams to n-grams, but complexity will be going up. Instead of selecting from candidate words randomly, one can have a probability-based selection as well. Instead of just sentences as input (and output) one can have paragraphs and even,(if we dare dream so high), whole essays.
Comments
Post a Comment