org.apache.uima.conceptMapper.support.tokens
Class TokenNormalizer

java.lang.Object
  extended by org.apache.uima.conceptMapper.support.tokens.TokenNormalizer

public class TokenNormalizer
extends Object


Field Summary
static String PARAM_CASE_MATCH
          Configuration parameter key/label for the case matching string
static String PARAM_STEMMER_CLASS
          Configuration parameter key/label for the stemmer class spec.
static String PARAM_STEMMER_DICT
          Configuration parameter key/label for the stemmer dictionary, passed into the stemmer's initialization method
 
Constructor Summary
TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext, Logger logger)
           
 
Method Summary
 String foldCase(String token)
          If one of the case folding flags is true and the input string matches the character pattern corresponding to that flag, then convert all letters to lowercase.
 Stemmer getStemmer()
           
 boolean isCaseFoldAll()
           
 boolean isCaseFoldDigit()
           
 boolean isCaseFoldInitCap()
           
 String normalize(String token)
           
 void setCaseFoldAll(boolean caseFoldAll)
           
 void setCaseFoldDigit(boolean caseFoldDigit)
           
 void setCaseFoldInitCap(boolean caseFoldInitCap)
           
 void setStemmer(Stemmer stemmer)
           
 boolean shouldFoldCase(String token)
           
 boolean shouldStem()
           
 String stem(String token)
          If the stemming flag is true, then return the stemmed form of the supplied word using the Porter stemmer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PARAM_CASE_MATCH

public static final String PARAM_CASE_MATCH
Configuration parameter key/label for the case matching string

See Also:
Constant Field Values

PARAM_STEMMER_CLASS

public static final String PARAM_STEMMER_CLASS
Configuration parameter key/label for the stemmer class spec. If left out, no stemmer is used

See Also:
Constant Field Values

PARAM_STEMMER_DICT

public static final String PARAM_STEMMER_DICT
Configuration parameter key/label for the stemmer dictionary, passed into the stemmer's initialization method

See Also:
Constant Field Values
Constructor Detail

TokenNormalizer

public TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext,
                       Logger logger)
                throws org.apache.uima.analysis_engine.annotator.AnnotatorContextException
Parameters:
annotatorContext -
logger -
Throws:
org.apache.uima.analysis_engine.annotator.AnnotatorContextException
Method Detail

getStemmer

public Stemmer getStemmer()
Returns:
Returns the stemmer.

setStemmer

public void setStemmer(Stemmer stemmer)
Parameters:
stemmer - The stemmer to set.

shouldStem

public boolean shouldStem()

isCaseFoldAll

public boolean isCaseFoldAll()
Returns:
Returns the caseFoldAll.

setCaseFoldAll

public void setCaseFoldAll(boolean caseFoldAll)
Parameters:
caseFoldAll - The caseFoldAll to set.

isCaseFoldDigit

public boolean isCaseFoldDigit()
Returns:
Returns the caseFoldDigit.

setCaseFoldDigit

public void setCaseFoldDigit(boolean caseFoldDigit)
Parameters:
caseFoldDigit - The caseFoldDigit to set.

isCaseFoldInitCap

public boolean isCaseFoldInitCap()
Returns:
Returns the caseFoldInitCap.

setCaseFoldInitCap

public void setCaseFoldInitCap(boolean caseFoldInitCap)
Parameters:
caseFoldInitCap - The caseFoldInitCap to set.

shouldFoldCase

public boolean shouldFoldCase(String token)

foldCase

public String foldCase(String token)
If one of the case folding flags is true and the input string matches the character pattern corresponding to that flag, then convert all letters to lowercase.

Parameters:
token - The string to case fold
Returns:
The case folded string

stem

public String stem(String token)
If the stemming flag is true, then return the stemmed form of the supplied word using the Porter stemmer.

Parameters:
token - the word to stem
Returns:
the original word if the stemming flag is false, otherwise the stemmed form of the word

normalize

public String normalize(String token)


Copyright © 2006-2011 The Apache Software Foundation. All Rights Reserved.