Adding components to wai-annotations through plugins

wai-annotations uses a plugin system to allow other libraries to add new processing stages without modifying the base library source. This document details how to go about using this system.

Generic wrappers

If you just want to run some code that won't ever make it into a released plugin, then you might want to make use of the wai.annotations.generic module. This module provides wrappers around sources/ISPs/sinks, i.e., you only have to write the source/ISP/sink class itself but not all the other boiler plate code that will make your plugin available within the wai.annotations framework.

Check out the example section. Also, the wai.annotations.audio module has a more concrete example for loading the Urban8k audio dataset with this wrapper approach.

Plugin Types

There are 3 types of stage which can be added to wai-annotations via plugin: input formats, processing stages and output formats. New domains can also be added, but these are specified indirectly via any of the previous components.

Domains

New domains are added indirectly to wai-annotations as dependencies of components, as a domain which has no components is essentially unreachable to a conversion chain. However, the specification of new domains is treated as a plugin-related issue, so it is detailed here.

Definition of a new domain is a multi-step process, as follows.

  • Define the type of the (unannotated) data for the domain, as a sub-class of Data. Objects of this class hold the name and binary data for an instance, along with any additional meta-data.
  • Define the type of the annotations for an instance in the domain. This can be any arbitrary type.
  • Define the type of instances in the domain as a sub-class of Instance. This represents the association between a data-item and its annotations. At its simplest, this is just a container for the data-item and the annotations, but additional meta-data about the relation can be added for specific domains, if required.
  • Define a specifier class for the domain, sub-classing DomainSpecifier. This advertises the domain to the wai-annotations framework.
Example
from typing import Type

# Import the base classes
from wai.annotations.core.domain import Data, Instance, DomainSpecifier

# Define the data-type which holds dataset item data
class MyDomainData(Data):
    @classmethod
    def from_file_data(cls, file_name: str, file_data: bytes) -> 'MyFileInfo':
        # If we don't need any additional information, or it is calculated in the init method
        return cls(file_name, file_data)

# Define the type of annotations for the domain
class MyDomainAnnotations:
    ...

# Define an instance type, adding additional functionality, if required
class MyDomainInstance(Instance[MyDomainData, MyDomainAnnotations]):
    @classmethod
    def data_type(cls) -> Type[MyDomainData]:
        return MyDomainData

    @classmethod
    def annotations_type(cls) -> Type[MyDomainAnnotations]:
        return MyDomainAnnotations

    def additional_method(self):
        ...

# Define the domain specifier reporting the various classes for domain instances
class MyDomainSpecifier(DomainSpecifier):
    @classmethod
    def domain_name(cls) -> str:
        return "my domain"

    @classmethod
    def description(cls) -> str:
        return "A detailed description of the domain"

    @classmethod
    def data_type(cls) -> Type[MyDomainData]:
        return MyDomainData

    @classmethod
    def annotations_type(cls) -> Type[MyDomainAnnotations]:
        return MyDomainAnnotations

    @classmethod
    def instance_class(cls) -> Type[MyDomainInstance]:
        return MyDomainInstance

Stages

Stages are a partial pipeline of components. Input stages consist of a source component and zero or more processing components, processing stages only consist of processing components, and output stages consist of zero or more processing components followed by a sink component.

To create a new stage, the individual components must be created and then advertised through a stage specifier class. The base classes for components can be imported from wai-annotations' core package:

from wai.annotations.core.component import SourceComponent
from wai.annotations.core.component import ProcessorComponent
from wai.annotations.core.component import SinkComponent

Sub-class the base classes for the types of components you are trying to implement, and fill in the generic type-parameters and abstract methods. See the examples below for guidance.

Then sub-class one of the stage-specifier classes for the type of stage you are creating:

from wai.annotations.core.specifier import SourceStageSpecifier
from wai.annotations.core.specifier import ProcessorStageSpecifier
from wai.annotations.core.specifier import SinkStageSpecifier

The specifier classes require that you override the description method, which provides a description of the stage, and the components method, which lists the components which comprise the stage. Source/Sink stages also require a domain method, which returns the domain that the stage reads/writes. Processor stages instead have a domain_transfer_function method, which returns the output domain for a given input domain (this way, processing stages can be used in multiple domains).

In order for wai-annotations to recognise your plugin, the specifier class needs to be advertised as an entry point in your setup script under the wai.annotations.plugins group:

# setup.py
from setuptools import setup

setup(
    ...,
    entry_points={
        "wai.annotations.plugins": [
            # Input stage
            "from-my-input-format=com.example.specifiers:MySourceStageSpecifier",

            # Output stage
            "to-my-output-format=com.example.specifiers:MySinkStagepecifier",

            # Processor stages
            "my-processor=com.example.specifiers:MyProcessorStageSpecifier"
        ]
    }
)
Example Input Stage

Input stages consist of a source component followed by zero or more processing components. Typically the source component will be wai.annotations.core.component.util.LocalFilenameSource, but this is not required. What is required is that the final component of the stage outputs instances in the domain that the stage operates.

from typing import Type, Tuple

from wai.annotations.core.component import SourceComponent, ProcessorComponent
from wai.annotations.domain.image.object_detection import (
  ImageObjectDetectionInstance, ImageObjectDetectionDomainSpecifier
)
from wai.annotations.core.specifier import SourceStageSpecifier
from wai.annotations.core.stream import ThenFunction, DoneFunction

# Define the source component (generic type is the type this source produces)
class MySourceComponent(SourceComponent[str]):
  # Any command-line options here...

  def produce(
          self,
          then: ThenFunction[str],
          done: DoneFunction
  ):
    # Call then(str) multiple times...
    ...

    # Call done()
    done()


# Define a processor component to parse each string from the source component into a domain instance
# (generic types are input and output type respectively)
class MyProcessorComponent(ProcessorComponent[str, ImageObjectDetectionInstance]):
  # Any command-line options here...

  def process_element(
          self,
          element: str,
          then: ThenFunction[ImageObjectDetectionInstance],
          done: DoneFunction
  ):
    # Parse each string and forward
    then(self._parse(element))

  def finish(
          self,
          then: ThenFunction[ImageObjectDetectionInstance],
          done: DoneFunction
  ):
    # Perform any clean-up
    ...

    # Call done
    done()

  def _parse(self, element: str) -> ImageObjectDetectionInstance:
    # Parse the string into an instance
    ...

# Create the specifier class for the stage
class MySourceStageSpecifier(SourceStageSpecifier):
  @classmethod
  def description(cls) -> str:
    return "My source stage"

  @classmethod
  def components(cls) -> Tuple[Type[MySourceComponent], Type[MyProcessorComponent]]:
    return MySourceComponent, MyProcessorComponent

  @classmethod
  def domain(cls) -> Type[ImageObjectDetectionDomainSpecifier]:
    return ImageObjectDetectionDomainSpecifier
Example Output Stage

Output stages consist of zero or more processing components followed by a single sink component. The first component of the stage must take instances in the stage's domain as input.

from typing import Type, Tuple

from wai.annotations.core.component import ProcessorComponent, SinkComponent
from wai.annotations.domain.image.object_detection import (
    ImageObjectDetectionInstance, ImageObjectDetectionDomainSpecifier
)
from wai.annotations.core.specifier import SinkStageSpecifier
from wai.annotations.core.stream import ThenFunction, DoneFunction

# Define a processor component to format each domain instance into a string
# (generic types are input and output type respectively)
class MyProcessorComponent(ProcessorComponent[ImageObjectDetectionInstance, str]):
    # Any command-line options here...

    def process_element(
            self,
            element: ImageObjectDetectionInstance,
            then: ThenFunction[str],
            done: DoneFunction
    ):
        # Format each instance and forward
        then(self._format(element))

    def finish(
            self,
            then: ThenFunction[ImageObjectDetectionInstance],
            done: DoneFunction
    ):
        # Perform any clean-up
        ...

        # Call done
        done()

    def _format(self, element: ImageObjectDetectionInstance) -> str:
        # Format the instance into an str
        ...

# Define the sink component (generic type is the type this sink consumes)
class MySinkComponent(SinkComponent[str]):
  # Any command-line options here...

  def consume_element(self, element: str):
    # E.g. write the string to disk
    ...

  def finish(self):
    # Any tidy-up
    ...

# Create the specifier class for the stage
class MySinkStageSpecifier(SinkStageSpecifier):
    @classmethod
    def description(cls) -> str:
        return "My source stage"

    @classmethod
    def components(cls) -> Tuple[Type[MyProcessorComponent], Type[MySinkComponent]]:
        return  MyProcessorComponent, MySinkComponent

    @classmethod
    def domain(cls) -> Type[ImageObjectDetectionDomainSpecifier]:
        return ImageObjectDetectionDomainSpecifier
Example Processor Stage

Processor stages consist of one or more processing components. The domain-transfer function defined on the specifier must match the input/output types of the chain of components for the stage.

from typing import Type, Tuple

from wai.annotations.core.component import ProcessorComponent
from wai.annotations.core.domain import Instance, DomainSpecifier
from wai.annotations.core.specifier import ProcessorStageSpecifier
from wai.annotations.core.stream import ThenFunction, DoneFunction
from wai.annotations.core.stream.util import ProcessState

# Define a processor component to remove every second instance (for any domain)
# (generic types are input and output type respectively)
class MyProcessorComponent(ProcessorComponent[Instance, Instance]):
    # Any command-line options here...

    # Process-state to track whether we need to remove the next instance
    remove_next: bool = ProcessState(lambda self: False)

    def process_element(
            self,
            element: Instance,
            then: ThenFunction[Instance],
            done: DoneFunction
    ):
        if not self.remove_next:
            then(element)

        self.remove_next = not self.remove_next

    def finish(
            self,
            then: ThenFunction[Instance],
            done: DoneFunction
    ):
        # Perform any clean-up
        ...

        # Call done
        done()

# Create the specifier class for the stage
class MyProcessorStageSpecifier(ProcessorStageSpecifier):
    @classmethod
    def description(cls) -> str:
        return "My source stage"

    @classmethod
    def components(cls) -> Tuple[Type[MyProcessorComponent]]:
        return MyProcessorComponent,

    @classmethod
    def domain_transfer_function(
            cls,
            input_domain: Type[DomainSpecifier]
    ) -> Type[DomainSpecifier]:
        # Because our processor stage works in any domain, and does not cross domains, just return the input domain
        return input_domain
Best Practice

Although in each of the examples shown here, we have defined our plugin specifiers in the same file as the components they advertise, this is not the recommended approach. The specifier types should instead be defined in their own sub-package, and the methods should locally import the specified types (instead of globally at the beginning of the specifier Python file). This is so the specifier can be imported into the plugin system without importing potentially heavy-weight libraries that the components depend on for their functionality. This way the system can provide reflection of the available plugins, but only load those plugins that are actually selected for use in a conversion.

Adding Command-Line Options to Plugin Components

See here for a description of command-line option support in wai-annotations.

Splitting

Sink-components (and by extension sink-stages) may require that the incoming instances be split across a number of output locations. Special support for splitting is provided in the wai.annotations.core.component.util sub-package. See here for information on how to add splitting to your sink-stages.