|
EssaysSubject*Free!!! Subject*free!!! Subject*FREE! Subject*Free! Subject*free! Subject*FREE Subject*Free Subject*free FREE!!! Free!!! free!!! FREE! Free! free! FREE Free free If you do this, be sure to consider versions with initial caps as well as all uppercase and all lowercase. Spams tend to have more sentences in imperative mood, and in those the first word is a verb. So verbs with initial caps have higher spam probabilities than they would in all lowercase. In my filter, the spam probability of ``Act'' is 98% and for ``act'' only 62%. If you increase your filter's vocabulary, you can end up counting the same word multiple times, according to your old definition of ``same''. Logically, they're not the same token anymore. But if this still bothers you, let me add from experience that the words you seem to be counting multiple times tend to be exactly the ones you'd want to. Another effect of a larger vocabulary is that when you look at an incoming mail you find more interesting tokens, meaning those with probabilities far from .5 ...» | Код для вставки книги в блог HTML
phpBB
текст
|
|