cmdTokenizerobject | en_us.t[4763] |
Superclass Tree | Property Summary | Method Summary | Property Details | Method Details |
cmdTokenizer : Tokenizer
cmdTokenizer
Tokenizer
object
patAlphaDashAlpha
patPunct
patSpelledTens
patSpelledUnits
rules_
acceptAbbrTok
buildOrigText
tokCvtAbbr
tokCvtApostropheS
tokCvtPluralApostrophe
tokCvtSpelledNumber
Inherited from Tokenizer
:
deleteRule
deleteRuleAt
insertRule
insertRuleAt
tokCvtLower
tokCvtSkip
tokenize
patAlphaDashAlpha | en_us.t[4962] |
patPunct | en_us.t[5081] |
patSpelledTens | en_us.t[5077] |
patSpelledUnits | en_us.t[5079] |
rules_ OVERRIDDEN | en_us.t[4764] |
acceptAbbrTok (txt) | en_us.t[4974] |
buildOrigText (toks) | en_us.t[5013] |
tokCvtAbbr (txt, typ, toks) | en_us.t[4994] |
When we find an abbreviation, we'll enter it with the abbreviated word minus the trailing period, plus the period as a separate token. We'll mark the period as an "abbreviation period" so that grammar rules will be able to consider treating it as an abbreviation -- but since it's also a regular period, grammar rules that treat periods as regular punctuation will also be able to try to match the result. This will ensure that we try it both ways - as abbreviation and as a word with punctuation - and pick the one that gives us the best result.
tokCvtApostropheS (txt, typ, toks) | en_us.t[4900] |
tokCvtPluralApostrophe (txt, typ, toks) | en_us.t[4924] |
tokCvtSpelledNumber (txt, typ, toks) | en_us.t[4948] |