org.apache.uima.tika
Class FileSystemCollectionReader

java.lang.Object
  extended by org.apache.uima.resource.Resource_ImplBase
      extended by org.apache.uima.resource.ConfigurableResource_ImplBase
          extended by org.apache.uima.collection.CollectionReader_ImplBase
              extended by org.apache.uima.tika.FileSystemCollectionReader
All Implemented Interfaces:
org.apache.uima.collection.base_cpm.BaseCollectionReader, org.apache.uima.collection.CollectionReader, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource

public class FileSystemCollectionReader
extends org.apache.uima.collection.CollectionReader_ImplBase

A collection reader that reads documents from a directory in the filesystem. This resource is different from the one in UIMA example as it uses TIKA to extract the text from binary documents and generates annotations to represent the markup


Field Summary
static String PARAM_INPUTDIR
          Name of configuration parameter that must be set to the path of a directory containing input files.
static String PARAM_LANGUAGE
          Name of optional configuration parameter that contains the language of the documents in the input directory.
 
Fields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_PARAM_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
 
Constructor Summary
FileSystemCollectionReader()
           
 
Method Summary
 void close()
           
 void getNext(org.apache.uima.cas.CAS aCAS)
           
 int getNumberOfDocuments()
          Gets the total number of documents that will be returned by this collection reader.
 org.apache.uima.util.Progress[] getProgress()
           
 boolean hasNext()
           
 void initialize()
           
 
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
 
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
 

Field Detail

PARAM_INPUTDIR

public static final String PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.

See Also:
Constant Field Values

PARAM_LANGUAGE

public static final String PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.

See Also:
Constant Field Values
Constructor Detail

FileSystemCollectionReader

public FileSystemCollectionReader()
Method Detail

hasNext

public boolean hasNext()
See Also:
BaseCollectionReader.hasNext()

getNext

public void getNext(org.apache.uima.cas.CAS aCAS)
             throws IOException,
                    org.apache.uima.collection.CollectionException
Throws:
IOException
org.apache.uima.collection.CollectionException
See Also:
CollectionReader.getNext(org.apache.uima.cas.CAS)

close

public void close()
           throws IOException
Throws:
IOException
See Also:
BaseCollectionReader.close()

getProgress

public org.apache.uima.util.Progress[] getProgress()
See Also:
BaseCollectionReader.getProgress()

getNumberOfDocuments

public int getNumberOfDocuments()
Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.

Returns:
the number of documents in the collection

initialize

public void initialize()
                throws org.apache.uima.resource.ResourceInitializationException
Overrides:
initialize in class org.apache.uima.collection.CollectionReader_ImplBase
Throws:
org.apache.uima.resource.ResourceInitializationException


Copyright © 2006-2011 The Apache Software Foundation. All Rights Reserved.