Universal Ink Model
Data Model
The Ink Model has the following logical sections:
- Data repositories - a collection of data repositories, holding raw sensor input, input device/provider configurations, splines, sensor channel configurations, rendering configurations, etc. Each data repository keeps certain data-set isolated and is responsible for specific type(s) of data.
- Logical data trees - a collection of logical trees, representing structures of hierarchically organized strokes.
- Semantic triple store - an RDF/WODL compliant triple store, holding semantic information.
The diagram below illustrates the different logical parts of the ink model.
The UML diagram (click here) below illustrates the complete Ink Model in terms of logical models and class dependencies. In the next sections of the current specification the different logical parts of the Ink Model are described in detail.
The ink model is serialized as a list of separate logical parts, described in section 10.1.2 RIFF Container: Data Chunks.
Data Repositories
Stroke Repository
The stroke repository holds a collection of strokes.
SensorData Repository
The SensorData repository holds a collection of raw input data frames (SensorData instances).
For more information on the SensorData Repository, see Raw Input and Biometrics below.
InputContext Repository
A data repository designed to hold input contexts, ink input providers, input devices, environments, sensor contexts, sensor channels, etc. The InputContext repository is described in more detail in section 5. Raw Input and Biometrics.
Brush Repository
A collection of brush definitions.
Universal Ink Model Identifier (UimID)
This specification defines a unique identifier to internally identify specific elements of the universal ink model. This identifier is a 128-bit number and is referred to as UimID in this document.
The UimID introduces two distinct string representations.
Simple Hexadecimal String Representation (S-Form)
The representation of a UimID value is the hexadecimal string representation of the 128-bit number, assuming that it's encoded using Big Endian byte ordering.
For example: fa70390871c84d91b83c9b56549043ca
Hyphenated Hexadecimal String Representation (H-Form)
This representation of a UimID value has the following format:
<part1>-<part2>-<part3>-<part4>-<part5>
Assuming that the UimID 128-bit value is encoded using Big Endian byte ordering, it is split into 5 groups of bytes and each group is formatted as hexadecimal number.
Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | |
---|---|---|---|---|---|
Bytes | 0, 1, 2, 3 | 4, 5 | 6, 7 | 8, 9 | 10, 11, 12, 13, 14, 15 |
For Example: fa703908-71c8-4d91-b83c-9b56549043ca
Note: The resulting string looks similar to a UUID4 string like "fa703908-71c8-4d91-b83c-9b56549043ca". However, it is not guaranteed that a UimID value could be converted to UUID4 as it does not define the same rules and restrictions as the UUID4 identifier.
Raw Input and Biometrics
This document defines a mechanism to store and maintain raw input data for the purposes of applications with biometric capabilities. This goal is achieved using two data repositories for storing the raw input datasource configurations as well as the raw input data:
- InputContext repository
- SensorData repository
The SensorData Repository is a data repository, which holds a collection of SensorData instances.
The following definitions apply:
Term: SensorData
A data-frame-like structure, which represents a collection of raw input data sequences, produced by one or more onboard device sensors, including data points re-sampling information and input sources fingerprints and metadata.
Remarks:
- SensorData is also referred to as 'raw input data-frame' in this reference document.
- Once a SensorData instance is added to the SensorData repository it is considered immutable.
- The SensorData instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme as the protobuf message SensorData.
- The SensorData repository serializes its underlying collection of SensorData instances using the protobuf message InputData in the repeated message field InputData.sensorData.
The InputContext Repository is a data repository responsible for storing information about where the raw input data-frame originates from, by allowing unique identification of the exact input source. The repository stores information about the device itself, the environment and the on-board device sensors for each data point.
The repository holds the following data collections:
- inkInputProviders - a collection of InkInputProvider instances
- inputDevices - a collection of InputDevice instances
- environments - a collection of Environment instances
- sensorContexts - a collection of SensorContext instances
- inputContexts - a collection of InputContext instances
The InputContext repository serializes its underlying data collections using the protobuf message InputContextData.
The following definitions apply:
Ink Input Provider
Term: InkInputProvider
The term InkInputProvider stands for the generic input data source - it identifies how the data has been generated (using touch input, mouse, stylus, hardware controller, etc).
Remarks:
- An InkInputProvider instance is serialized according to section 10.2 Protocol Buffers Serialization - Scheme using the protobuf message InkInputProvider.
- The InkInputProvider identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "InkInputProvider" and the following components:
- InkInputProvider.type
- InkInputProvider.properties
Input Device
Term: InputDevice
The term InputDevice stands for the hardware device(s), on which the sensor data has been produced (touch-enabled mobile device, touch-capable monitor, digitizer, etc).
Remarks:
- Once an InputDevice instance is added to the InputContext repository it is considered immutable.
- The InputDevice identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "InputDevice" and the following components: InputDevice.properties
- An InputDevice instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message InputDevice.
Environment
Term: Environment
The term Environment stands for the virtual environment, in which the sensor data has been produced (the operating system, etc).
Remarks:
- Once an Environment instance is added to the InputContext repository it is considered immutable.
- The Environment identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "Environment" and the following components:
- Environment.properties
- An Environment instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message Environment.
SensorContext
Term: SensorContext
The SensorContext defines a unique combination of sensor channel contexts, used for capturing the digital ink input. For that purpose, a SensorContext instance holds a list of SensorChannelsContext instances.
Remarks:
- Once a SensorContext instance is added to the InputContext repository it is considered immutable.
- The SensorContext identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "SensorContext" and the following components:
- list of the identifiers of the SensorChannelsContext instances contained within the current SensorContext
- A SensorContext instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message SensorContext.
SensorChannelsContext
Term: SensorChannelsContext
The term SensorChannelsContext is defined as a unique combination of:
- an InkInputProvider instance,
- an InputDevice instance and
- a list of sensor channel definitions (by holding a collection of SensorChannel instances)
- sampling rate hint and latency values
Remarks:
- Once a SensorChannelsContext instance is added to the InputContext repository it is considered immutable.
- The SensorChannelsContext identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "SensorChannelsContext" and the following components:
- list of the identifiers of the SensorChannel instances contained within the current SensorChannelsContext
- sampling rate hint
- latency
- identifier of the InkInputProvider instance
- identifier of the InputDevice instance
- A SensorChannelsContext instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message SensorChannelsContext.
SensorChannel
Term: SensorChannel
Represents a generic sensor channel definition, which has the following properties:
- type - URI uniquely identifying the type of the sensor channel (check the table below for predefined sensor channels),
- metric - the type of the data to the SI metric system, resolution - a factor multiplication value (power of 10) used to convert the stored data values to the specified SI metric,
- min, max - lower and upper bounds of the reported values range,
- precision - the precision of the sensor when reporting floating-point values (defined as an int value, used as a power of 10 during the serialization/de-serialization phase).
Remarks:
- Once a SensorChannel instance is added to the InputContext repository it is considered immutable.
- The SensorChannel identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "SensorChannel" and the following components:
- identifier of the InkInputProvider instance contained within the SensorChannelsContext, which holds the current SensorChannel
- identifier of the InputDevice instance contained within the SensorChannelsContext, which holds the current SensorChannel type
- metric - expressed as a string value according to the Protocol Buffers Serialization Scheme, protobuf enumeration message InkSensorMetricType
- resolution
- min
- max
- precision
- A SensorChannel instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message SensorChannel.
The table below describes a list of predefined sensor channel types, defined as URIs, which are used to distinguish between different sensor characteristics.
URI | Description |
---|---|
will://input/3.0/channel/X | The X coordinate of the raw input data. |
will://input/3.0/channel/Y | The Y coordinate of the raw input data. |
will://input/3.0/channel/Z | The Z coordinate of the raw input data. |
will://input/3.0/channel/Timestamp | The timestamp value of the raw input data. |
will://input/3.0/channel/Pressure | The pressure value of the raw input data. |
will://input/3.0/channel/RadiusX | The radius X value of the raw input data. |
will://input/3.0/channel/RadiusY | The radius Y value of the raw input data. |
will://input/3.0/channel/Azimuth | The azimuth value of the raw input data. |
will://input/3.0/channel/Altitude | The altitude value of the raw input data. |
will://input/3.0/channel/Rotation | The rotation value of the raw input data. |
Sensor Channel Data
Term: ChannelData
The ChannelData represents a data array with raw sensor input values. A ChannelData instance holds a reference to a sensor channel definition (SensorChannel instance).
Input Context
Term: InputContext
The InputContext is defined as a combination of an Environment instance and a SensorContext instance.
Remarks:
- Once an InputContext instance is added to the InputContext repository it is considered immutable.
- The InputContext identifier is unique in the scope of the InkModel and is auto-generated based on the MD5-hash-based Unique Identifier Generation Algorithm using the tag "InputContext" and the following components:
- identifier of the Environment instance contained within the InputContext
- identifier of the SensorContext instance contained within the InputContext
- An InputContext instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message InputContext.
Strokes
The ink model keeps the digital ink content inside a Stroke Repository, which is a data repository, holding a collection of Stroke instances. The following definitions apply:
Term: Stroke
A Stroke is defined as a combination of:
- a Catmull-Rom spline in the form of a sequence of data points (mandatory), optionally including per-point transformational and color data,
- rendering configuration about how the spline should be visualized (optional),
- reference to raw input data (SensorData instance), which the stroke originates from (optional)
Remarks:
- Once a Stroke instance has been added to the Stroke Repository it is considered immutable.
- A stroke with less than 4 data points is considered invalid.
- A Stroke instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message Stroke.
The Catmull-Rom spline is defined in the scope of the Stroke using the following properties:
- splineTs, splineTf - Spline start and end parameters
- spline - a sequence of spline data points
- color - a sequence of color values (per spline data point; if provided, the length of this sequence must be equal to the spline points count)
- perPointTransform - a sequence of transformation matrices (per spline data point; if provided, the length of this sequence must be equal to the spline points count)
The Stroke may keep information about its origin raw input data-frame. This relationship is encoded using the following properties:
- sensorData - a reference to a SensorData instance, provided by the SensorData repository
- sensorDataOffset - a point index within the raw input data-frame, mapped to the first point of the Catmull-Rom spline
- sensorDataMapping - an array of indices within the raw input data-frame, mapped to each point of the Catmull-Rom spline (if provided, the length of this array must be equal to the spline points count))
The Stroke may also keep additional information about how the spline should be visualized by defining a Style property, which is described in detail in section 7. Rendering
Rendering
A Stroke instance may hold information about how it should be interpolated and rendered by the WILL core engine. This additional configuration is defined using the stroke's Style property.
Term: Style
The Style is defined as a combination of a PathPointProperties configuration, reference to a Brush, a random number generator seed value and rendering method type. Setting the Style property allows overriding of specific path point properties, color components and/or matrix transformational components. A Style with PathPointProperties configuration should be normally used to define constant path components.
Remarks:
- A Style instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message Style.
Examples:
- A Stroke with per-point color data and a Style with alpha value will result in a Stroke rendered with constant alpha.
Term: PathPointProperties
A simple data model, which may hold size, color components and matrix transformational components.
Remarks
- A PathPointProperties instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message PathPointProperties.
The Information about how a Stroke shall be rendered is incorporated in the Style using the rendering method type (renderModeURI) and a reference to a Brush, holding rendering and rasterization configuration. The ink model keeps all available rendering configurations within the Brush Repository, a data repository that holds a collection of Brush instances. The following definitions apply:
Term: Brush
A rendering/rasterization configuration configuration used during the stroke rendering phase. In terms of class hierarchy the Brush should be considered an abstract class, which is specialized into the following classes:
- VectorBrush - a configuration, which allows rendering of an interpolated Catmull-Rom spline as a vector spline by applying a specific polygon for each interpolated point depending on its size and merging result afterward.
- RasterBrush - a configuration, which allows rendering of an interpolated Catmull-Rom spline as a raster image by applying a specific sprite for each interpolated point depending on its size.
Remarks:
- A VectorBrush instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message VectorBrush.
- A RasterBrush instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message RasterBrush.
- The Brush repository serializes its underlying collection of brush instances using the protobuf message Brushes in the repeated message fields Brushes.vectorBrushes and Brushes.rasterBrushes.
Term: PathPointProperties
A simple data model, which may hold size, color components and matrix transformational components.
Remarks:
- A PathPointProperties instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message PathPointProperties.
The table below describes a list of predefined rendering modes, defined as URIs.
URI | Description |
---|---|
will://rasterization/3.0/blend-mode/SourceOver | |
will://rasterization/3.0/blend-mode/DestinationOver | |
will://rasterization/3.0/blend-mode/DestinationIn | |
will://rasterization/3.0/blend-mode/DestinationOut | |
will://rasterization/3.0/blend-mode/Lighter | |
will://rasterization/3.0/blend-mode/Copy | |
will://rasterization/3.0/blend-mode/Min | |
will://rasterization/3.0/blend-mode/Max |
Ink Trees
The digital ink content, contained within a universal ink model, is organized in logical trees of ink nodes - they represent hierarchically organized ink-centric structures, and are also referred to as ink trees.
For the sake of clarity an Ink Tree is defined as follows:
Term: InkTree
The InkTree is defined as an ordered logical tree of nodes, called ink nodes (InkNode). It is used to represent the hierarchical structures of strokes.
Remarks:
- The name of the ink tree is unique in the scope of the ink model.
- An ink tree instance is allowed to hold only one root node (see the definition of InkNode below for more details). If no root node is defined, the ink tree is considered valid, but empty.
- An ink tree instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message InkTree.
- Each ink tree is identified in the knowledge graph by its URI. See section A. Ink Model 3.1 URI Scheme for more details.
Term: InkNode
A logical node of an ink tree. In terms of a class hierarchy the InkNode should be considered an abstract class, which is specialized into the following classes:
- StrokeNode - a leaf node, holding a reference to a Stroke, provided by the Stroke Repository
- InkGroupNode (abstract) - a non-leaf node, used to group other ink nodes; specialized into:
- StrokeGroupNode - a non-leaf node, used to group ink nodes of type StrokeNode and/or StrokeGroupNode
Remarks:
- The InkNode identifier is unique in the scope of the InkModel.
- Each ink node is identified in the knowledge graph using its URI. See section A. Ink Model 3.1 URI Scheme for more details.
- An InkNode instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme using the protobuf message Node.
Term: Main Ink Tree
Each ink model contains a main ink tree, which, when traversed using a depth-first search algorithm, specifies how the ink model should be rendered - this logical tree is also called the primary ink tree. The main ink tree is defined as the logical tree, containing the visual representation of the ink model - this statement sets the requirement that all ink nodes of the main ink tree are renderable.
Remarks:
- Universal ink model without the main ink tree is not valid.
- Each stroke of the stroke repository should be referenced by an ink node of the main ink tree.
- The ink nodes of the main ink tree may not reference stroke fragments.
- The name of the main ink tree is "main".
- The main ink tree is identified in the knowledge graph by its URI. See section A. Ink Model 3.1 URI Scheme for more details.
A universal ink model may also hold a collection of secondary ink trees, called ink views (Ink View). An Ink View is an ink tree, which in contrast to the main ink tree may not reference all strokes of the stroke repository. The ink views represent different aspects of the Ink Model, for instance - text segmentation view, named entity recognition view, etc.
Term: InkView
An InkView is defined as a named ink tree, which may not reference all strokes of the stroke repository.
Remarks:
- On main ink tree modification, all ink views are considered invalid and should be invalidated / rebuilt by the implementations.
- An ink view may hold ink nodes referencing stroke fragments of the same stroke, but only if they don't overlap; otherwise, the ink view is considered invalid.
- Each ink view is identified in the knowledge graph by its URI. See section A. Ink Model 3.1 URI Scheme for more details.
Ink Tree Example
The following ink tree names are reserved
Name | Description |
---|---|
main | Identifies the main ink tree. |
sdm | Reserved |
hwr | Reserved by the Semantic Ink service. |
ner | Reserved by the Semantic Ink service. |
Knowledge Graph
The Ink Model Specification provides a standard mechanism to describe relationships between different parts of the ink model, and/or between parts of the ink model and external entities. The Ink Model keeps an instance of a RDF or WODL-compliant triple store, called Knowledge Graph in the scope of this document. This triple store holds a list of semantic triples to encode relationships between subject, predicate and object as defined in the RDF specification.
Using the knowledge graph nodes of the ink trees, contained within the ink model, could be annotated with additional metadata in order to describe different aspects of the ink model, for instance - text segmentation view, named entity recognition view, etc.
Term: Knowledge Graph
A RDF or WODL-compliant triple store, represented by a collection of SemanticTriple instances.
Remarks:
- Nodes of the logical trees, contained within the InkModel, are identified by URIs in the triple store in accordance with the guidelines in section A. Ink Model 3.1 URI Scheme.
- A Knowledge Graph instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme as the protobuf message TripleStore.
Term: SemanticTriple
An RDF compliant triple, which consists of:
- subject
- predicate
- object
Remarks:
- A Knowledge Graph instance is serialized according to section 10.2 Protocol Buffers Serialization Scheme as the protobuf message SemanticTriple.
Ink models' knowledge graphs are WODL compliant since version 3.1 of the Universal Ink Model Specification. In order to ensure knowledge graph consistency and enable ontology analysis and unambiguous knowledge inference the following WODL semantic triples are considered to be present in the knowledge graph of each universal ink model:
Subject | Predicate | Object |
---|---|---|
uim:model | @ | will:uim/1.1/InkModel |
uim:model | hasSchemaInclude | will:uim/1.1 |