Utility Components
Utility Components#
wai-annotations comes with some utility base-classes for common types of components, found in the
wai.annotations.core.component.util
package. They are listed here.
Input#
LocalFilenameSource#
TODO
AnnotationFileProcessor#
TODO
Output#
LocalFileWriter#
TODO
SeparateFileWriter#
TODO
JSONFileWriter#
TODO
Splitting#
SplitSink#
Specialised sink component for handling splitting the stream of processed items among a group of splits. Takes 2
options: --split-names
, followed by a name for each split; and --split-ratios
, followed by an integer ratio of
how many items to add to each split. The sink component will then alternate which split it is acting on behalf of for
each item it receives. The alternation is designed so that at any stage, the ratio of items processed for each split is
as close as possible to the specified --split-ratios
.
Sub-classes of SplitSink
should implement the consume_element_for_split
/finish_split
methods, similar to an
ordinary sink component. During execution of either of these methods, the following information can be utilised:
self.is_splitting
: whether the user specified any splits to perform.self.split_label
: the label of the current split (specified by--split-names
). This isNone
if splits weren't specified.
The following sections detail additional functionality that can be added to split-sinks.
SplitState#
Specialised form of process state which manages separate state for each split. Accessing this property will return a separate instance depending on which split the component is currently acting for.
RequiresNoSplitFinalisation#
Specialised form of the RequiresNoFinalisation mixin, which declares that the
split-sink doesn't need to perform any clean-up after all items have been processed. This automatically implements
finish_split
.
WithPersistentSplitFiles#
A mixin class for use with components which inherit from SplitSink
and need to keep files open while writing them
across the splits. The _init_split_files
method should be implemented to open all files needed by the sink, and return
them as some sort of collection. The _iterate_split_files
should iterate over the collection returned by
_init_split_files
. Components which implement this mixin can then access the _split_files
property during
execution of their consume_element_for_split
/finish_split
methods to get access to the file-collection for the
current split.
Other#
General-purpose utilities for components.
Buffer#
This processor component buffers the entire stream, and passes it as a list to the next component in its finish
method.
Enumerator#
This processor component enumerates each item passed to it, forwarding a tuple of the 0-based index of the item and the item itself.
WithRandomness#
This is a mixin class which can be used with any component, and adds a --seed
option for randomisation. The help-text
for the option can be specified. If the user sets a seed value, the random
property will provide a Random
instance
initialised with the provided seed. If the option was not set, None
will be returned by the property.