Concepts from mathematical crystallography and group theory are used here to quantize the group of rigid-body motions, resulting in a “motion alphabet” with which robot motion primitives are expressed. From these primitives it is possible to develop a dictionary of physical actions. Equipped with an alphabet of the sort developed here, intelligent actions of robots in the world can be approximated with finite sequences of characters, thereby forming the foundation of a language in which robot motion is articulated. In particular, we use the discrete handedness-preserving symmetries of macromolecular crystals (known in mathematical crystallography as Sohncke space groups) to form a coarse discretization of the space of rigid-body motions. This discretization is made finer by subdividing using the concept of double-coset decomposition. More specifically, a very efficient, equivolumetric quantization of spatial motion can be defined using the group-theoretic concept of a double-coset decomposition of the form , where is a Sohncke space group and is a finite group of rotational symmetries such as those of the icosahedron. The resulting discrete alphabet is based on a very uniform sampling of and is a tool for describing the continuous trajectories of robots and humans. An efficient coarse-to-fine search algorithm is presented to round off any motion sampled from the continuous group of motions to the nearest element of our alphabet. It is shown that our alphabet and this efficient rounding algorithm can be used as a geometric data structure to accelerate the performance of other sampling schemes designed for desirable dispersion or discrepancy properties. Moreover, the general “signals to symbols” problem in artificial intelligence is cast in this framework for robots moving continuously in the world.
The aim of this paper is to develop a “motion alphabet” with which robot motion primitives are expressed. From these primitives one can develop a dictionary of physical actions. Two main themes from the theory of Lie groups are used to construct this alphabet (or quantization) and to efficiently solve the “signals to symbols” problem in this context: (1) the decomposition of a Lie group into cosets, double cosets, and corresponding fundamental domains; (2) the possibility of constructing such fundamental domains as Voronoi or Voronoi-like cells.
Equipped with an alphabet of the sort developed in this paper and the associated algorithms for efficiently rounding off continuous motions to nearby discrete representatives, intelligent actions of robots in the world can be approximated with finite sequences of characters, thereby forming the foundation of a language in which robot motion is articulated.
At the macroscopic scale, the world can be thought of as continuous. Coarse descriptions of this continuous world are used by intelligent systems (e.g., humans and computers) to classify objects, actions, and scenarios. The classical “signals to symbols” problem in artificial intelligence (AI) seeks to bin everything in the continuous world into countable classes characterized by strings of discrete symbols, such as the letters in an alphabet. In a sense, this is the inverse problem of what the genetic code does, since the finite alphabet encodes the morphology and metabolism of every living creature that moves in the continuous world and processes information with an analog brain.
Discrete alphabets (including the Roman alphabet, the radicals that form characters in Asian languages, sign language, etc.) form the basis for all human languages . All discrete characters can be reduced to a binary code, e.g., ASCII or Morse code in the case of western letters. The efficiency of an alphabet (or a code) depends on how much information can be conveyed with a given number of symbols, and how difficult it is to convey those symbols. For example, since the letter “e” is the most often used symbol in the English language, it is represented by a single “” in Morse code. Perhaps the simplest and most widely occurring motion of a robot is the “nil motion”, or identity element of the motion group, describing the “home” pose (position and orientation), which coincidentally can be denoted simply as “”. All other elements of our motion alphabet will be written as a product of the form where , a crystallographic space group, and , a finite group of rotational symmetries.
As an example of how discrete symbols classify the world, consider how each of the following sentences describes a class of situations in which there is continuous freedom which becomes more restricted as the number of discrete symbols increases:
The cat is on the table.
The black cat is sitting on the table.
The black cat is sitting on the table and looking at you.
In (1), the cat could be of any color and lying down, sitting, or standing in an infinite variety of postures. The color and posture of the cat get somewhat restricted as the sentences get longer, but we still do not know how the cat is sitting, what its tail is doing, how big it is, its weight, the color of its eyes, etc. Indeed, the sentence would need to be quite long to hone in on what is actually going on, hence the old saying “one picture is worth a thousand words.”
The basic problem that must be reconciled by intelligent robots is to approximate, or round off, continuous objects and actions within a discrete descriptive framework and then truncate the discrete description at some finite level given limitations on computational and sensing resources. This is true in both machine-learning (ML) frameworks such as deep learning and classical AI. A specific kind of rounding off (of Euclidean motions) is the main subject of this paper, which is about making precise the round-off statement in the context of motions of objects and intelligent agents in the world.
A real-world problem that can be addressed using this discrete description of motions is robot motion planning, which is one of the fundamental topics in the field of intelligent robotics. It has a wide range of applications such as autonomous vehicles [2–4], mechanical parts assembly [5, 6], space on-orbit manipulations , and protein folding . In general, motion planning seeks to answer the query: “how to plan a path that guides the robot from a given start pose to an end pose subject to some geometric or dynamical constraints”. And a popular way that has been applied for decades is to build a “roadmap” , which is basically a graph structure consisting of valid vertices and edges. A large number of efficient algorithms have been proposed such as visibility graph , Voronoi graph , cell decomposition , probabilistic roadmap , and their variants. For a given environment and constraints (obstacles, nonholonomic constraints, etc.), a roadmap provides information with qualified vertices and connectivity among nearby vertices. And once a query is submitted, graph searching algorithms give a valid optimal sequence of motions from the roadmap. Although a roadmap method is able to answer multiple searching queries, when the environment is changing, those vertices or edges information needs to be updated. To some extent, quantizing a continuous motion of the robot from a predefined motion alphabet is closely related to the roadmap concept. However, the advantage lies in the storage and representation of the motion sequences—once a rich library of alphabets is built a priori, representing different motions in different environment subject to different constraints is just a matter of combinations and ordering of the alphabet indices.
The major contributions of this paper are listed as follows:
(1) A “signals to symbols” framework for Euclidean motions based on double-coset decomposition is proposed and its properties are analyzed.
(2) Concrete formulations of the motion alphabet are constructed through crystallographic symmetry.
(3) Fast decoding algorithms via coarse-to-fine double-coset decomposition are proposed and numerically verified.
(4) Comparisons with existing methods for discretizing the Euclidean group and decoding random motions are performed.
(5) A fast hybrid search method that incorporates some existing sampling methods for rotation group with good dispersion or discrepancy properties has been proposed and verified.
The remainder of this paper is structured as follows: In Section 2, a review of the immense literature on machine intelligence as it applies to intelligent robot action in the world is provided. This is a rapidly changing field and impossible to capture in full detail, but some classical highlights are covered. Section 3 reviews some relevant aspects of abstract group theory. This is made concrete in Section 4 which focuses on the group of rigid-body motions. Section 5 reviews crystallographic symmetry, which is a source of discrete symbols from which a motion alphabet is constructed. In Section 6, we develop a motion alphabet by “dividing up” the group of rigid-body motions via fine double-coset decomposition based on a crystallographic Sohncke space group and the subgroup of rotational symmetries of the icosahedron. Section 7 presents other choices for motion alphabets and solves the decoding problem efficiently by introducing a coarse-to-fine search scheme. In Section 8, comparisons with existing methods for sampling rotations and Euclidean motions are performed, and it is shown how our alphabet can be used as a geometric data structure to enhance the speed of these other motion-approximation methods.
2. Literature Review
This section reviews two largely disjoint areas of the literature: (1) the interface between group theory and machine learning and AI; (2) sampling methods and measures of their quality.
2.1. Lie Groups in Machine Learning and AI
The recognition of human (and humanoid-robot) actions has been studied from many different perspectives including [14–19]. Probabilistic graphical models , generative models , and recently “SE3-nets”  have been used to describe motion uncertainty in the context of learning. Works on vision and reasoning use concepts of quotient operations to mod out irrelevant information [23–25].
Group-theoretic methods (and abstract algebra more generally) can be found sprinkled throughout the AI literature [26–32]. Although group theory has long history [33–35], it is still a useful and popular tool for solving problems related to motions [36–40].
Of particular importance in the current context is the relationship between artificial intelligence and machine learning. AI arose as a branch of cybernetics, focusing on artificial aspects of reasoning and cognition, thereby leading to a redefinition of cybernetics to focus on information and control. Machine learning (and particularly deep neural networks) led by Hinton, LeCun, Bengio, and others can be viewed as an alternative to classical AI [41–45]. Recently, geometric and algebraic methods are being explored in some forms of machine learning [46–48].
The goal of this paper is to develop an alphabet of basic motions from which discrete words that capture the essence of a continuous motion/action are constructed. Throughout the literature, discretization of motions has attracted significant interest, where the Euclidean group is one of the popular ones . In particular, the uniform sampling of rotations, either random or deterministic, has a wide range of applications [50–56]. This is also one of the applications that this work will address. The discretization of the continuous motions will generate part of a dictionary. This dictionary will, in the future, serve as the knowledge base for an expert system that will enable the robot to function at first use “right out of the box.”
2.2. Measurements of Sample Quality: Discrepancy, Dispersion, Consistency, and Uniformity
Sampling rotations in an efficient way play important roles in several fields including computer graphics [50, 57], protein crystallography , molecular physics , materials science , and robot motion planning [53, 55].
Several measures of the quality of a finite sampled set of rotations have been proposed in the literature. One is discrepancy as defined and used in [53, 61, 62]:where is a collection of measurable subsets of SO(3), is a finite set of sample points in SO(3), and is the number of elements in the finite set . This concept was updated recently to include products of motion groups .
Natural measures of distance between rotations, or between rigid-body motions, can be computed in a variety of different ways, as reviewed in detail in .
Given any such metric, , the dispersion of the points can be computed as where is a set of the sampled elements in SO(3).
In addition to discrepancy and dispersion, many other measures of sample quality can be defined. For example, a sampling can be called consistent if the distribution of the round-off error for any random rotation is concentrated. Therefore, we define the consistency of a set of samples on SO(3) aswhere and is the standard deviation of the set of distances between each sample point and its nearest neighbor. Here a low value of indicates high consistency. For example, if is zero, then every point in the set has nearest neighbors of the same distance. Moreover, if each point in the set has many neighbors that achieve the minimal value of distance, then the set has a high level of uniformity, which can be quantified as follows. For each compute the subset as Then the cardinality of this set measures the number of equally close nearest neighbors to , and uniformity weights this by the spread of this number:where a high value indicates high uniformity. The numerator reflects the number of nearest neighbors for the worst sample, and the denominator reflects the spread (with included since the difference between max and min can be zero).
For example, an integer lattice in 3D Euclidean space has a high level of uniformity with regard to the Euclidean metric, with each point having six nearest neighbors, each with the same minimized value of distance, and so and . A spherical close-packing can have an even higher value of .
3. Some Relevant Aspects of Group Theory
The alphabets constructed in this paper consist of carefully chosen elements of the group of rigid-body motions, drawn from fundamental domains of double-coset spaces. Since this terminology might not be familiar to some readers with an interest in the topic, the relevant concepts from group theory are reviewed here. The concept of a group itself is assumed to be known.
3.1. Definitions and Properties
Let a group and a subgroup be given. For an element , the corresponding left coset is defined as Analogously, the respective right coset is defined as
A subgroup is called a normal subgroup of if it is conjugated to itself; for all . If is both a normal and a proper subgroup of , then it is denoted as .
A group can be divided into cosets (left or right), all with the same number of elements, and the set of all these cosets is called a coset space. In general, the left-coset space and the right-coset space are different from each other, except when is normal in G (), in which case for all . In this special case is a factor (or quotient) group.
Given two subgroups , it is also possible to define double cosets and the corresponding double-coset space,
Given a left-coset decomposition with respect to a subgroup , it is possible to define a (nonunique) fundamental domain consisting of exactly one element per left coset. Since a group can be partitioned into disjoint cosets, it is and(and analogously in the right-coset case ). It is important to note that we are generally dealing with unions of disjoint sets in this paper. When , such fundamental domain is a group with respect to the original group operation modulo , and this group is isomorphic to .
Given two subgroups , the corresponding double-coset decomposition is and we further have thatwhere is a fundamental domain for the double-coset space consisting of exactly one element per double coset. When is a Lie group (i.e., when has the structure of a differentiable manifold, and when further the group operation and inversion are compatible with this smooth structure) and are discrete subgroups, then such fundamental domains and will have the same dimensionality as , but smaller volume.
When is a Lie group, one way to construct fundamental domains is as Voronoi-like cells: Since is a smooth manifold, a proper distance function (metric) exists, and we can definewhere is the identity element of , and when ,Above we have defined the interior of fundamental domains. The union of the corresponding shifts of these open sets will not in general completely exhaust G (cf. (16) and (18)). However, a set of measure zero will at most be missing, which is not relevant for our practical purposes.
A unimodular Lie group is one for which there exists an integration measure, , or equivalently a volume element , such that for every integrable function , In the case of a compact Lie group such as , .
In terms of Euler angles , it is well known that the volume element for SO(3) is  Consequently, sampling uniformly in Euler angles leads to clumping of samples around and , and under sampling near .
Some authors therefore sample nonuniformly, by making a change of coordinates as . Then , and . This results in equivolumetric portioning of in the coordinates with volume element
However, equipartitioning into units of equal volume is not the same as equipartitioning into units of equal shape. One attempt to partition based on shape arose nearly 50 years ago in the context of protein crystallography, where Lattman  realized that the metric tensor for SO(3) as expressed in Euler angles could be diagonalized by changing to . This diagonalization does not change the volume element, which remains .
Group theory has been a cornerstone in all areas of the physical sciences including crystallography, classical mechanics, quantum mechanics, chemistry, and particle physics. Moreover, in classical works on finite automata theory, attempts were made to incorporate the theory of finite groups [26–28]. Many roboticists and computer vision researchers know about the special Euclidean group , which is a Lie group that describes rigid-body motions. This will be discussed later, after first reviewing the group of pure rotations.
3.2. A Concrete Example: The Rotation Group
In the following, the abstract definitions are illustrated with a concrete example. The set of rotation matrices is called the special orthogonal group and is denoted as . That is,where is the identity matrix. Here the group operation is simply matrix multiplication. It is not difficult to show that is the group identity, given any two that the matrix product , and that is the inverse of .
Explicitly for , elements of the associated Lie algebra , which correspond to infinitesimal rotations, are skew-symmetric matrices and the matrix exponential (or exponential) map gives where is the dual vector corresponding to . The opposite operation gives . The parameters , , and can be thought of as Cartesian coordinates in the Lie algebra , and when these coordinates are restricted to the range , they can be used to parameterize all of through the exponential map. When the point is at the boundary. In such a case and describe the same rotation, and so one model for is that of a solid ball of radius in Euclidean space, with antipodal points identified as being equivalent, or “glued.”
The inverse map for each is the matrix logarithm. This degenerates when the rotation angle is . By restricting the discussion to the case when , the logarithm is uniquely defined on a subset of depleted by the set of measure zero defined by . This depletion will have no effect on our formulation. Indeed, we can define the metric when is not a rotation by , and otherwise .
It is interesting to note that the above distance function for is bi-invariant; i.e., Using this particular metric it is possible to construct Voronoi cells (in the classical sense) in for fundamental domains and , because then (19) and (20) becomeandrespectively.
Of particular interest to us are the cases where is one of the finite groups of rotational symmetries of the Platonic solids. This is shown in Figure 1 (see also ). In this figure the fundamental domains are depicted in exponential coordinates in (identified with a ball of radius , as explained above). Note that this is a conceptual plot, since actually the edges and faces of these Voronoi cells are slightly bent. The number of elements in is 12 for the group of tetrahedral, for the group of octahedral, and for the group of icosahedral rotational symmetries. By the left (or right) action of on the respective fundamental domain, it is possible to (almost completely) cover ; cf. (16).
The coset spaces resulting from quotienting the rotation group by discrete subgroups have been studied in the pure mathematics literature under the names “spherical space forms”  and Poincaré homology 3-spheres . The geometric and topological properties of these manifolds are related to how the opposing faces of our tiles can be glued together.
If is the group of rotational symmetry operations of the icosahedron, then and can be viewed as a dodecahedral cell centered at the origin of the Lie algebra (see Figure 1, right). If we choose the second subgroup to be a conjugated group of the tetrahedral, octahedral, or icosahedral symmetries (i.e., , where is the group of the rotational symmetries of the respective Platonic solid and is chosen such that ), then the Voronoi cell takes a shape as exemplarily shown in Figure 2. On the other hand, if we choose , then cannot be constructed as a Voronoi cell, but it can be chosen as an irregular tetrahedron (the ruby region in Figure 3), yielding a subdivision of the dodecahedral cell by conjugation of the tetrahedron with the elements in . Note that such conjugation has the effect of rotation in since . Similarly, if , then a -fold division of can be computed to represent , and can be reconstructed from these pieces, again by conjugation.
When choosing to be icosahedral and or , this means that we can divide into 3600 pieces of equal size. The 3600 respective barycentric or Voronoi centers of these cells can be taken as a discretization of , and any of these centers can be written in a unique way as where with . This means that any one of the 3600 points corresponds to a two-letter word .
A natural question to ask is then, for , how do we find the closest word to approximate it? This decoding or “signals to symbols” problem is addressed in Section 7.1.
4. Rigid-Body Motions as a Group Used for Framing Robots
Given any rigid object or multi-rigid-body actor (such as a human, robot, self-driving car, or smart house), the behavior and use of this item involve the time evolution of its conformation and its overall position and orientation, or “pose”. A common descriptive framework for both the internal (conformational) degrees of freedom and their relative pose (configuration) is to attach reference frames on each rigid component.
Let denote a rigid-body motion relative to a reference frame fixed in space, where is a rotation matrix and is a translation vector. The set of all such motions forms a six-dimensional Lie group, the special Euclidean group . This is a group because the composition operation satisfies the properties of closure and associativity, the identity exists and is simply (with zero translation), and the inverse of is . Note that is not commutative (Abelian). The group operation is the same as multiplying homogeneous transformation matrices; i.e.,
In the case of planar motions, we deal with the special Euclidean group , where is parameterized by a single angle , and has components , totalling three degrees of freedom. is the group of rotations in , defined analogously to (23). In general, is an example of an outer (or external) semidirect product which combines the two groups and into the new group The underlying set of this group is the Cartesian product , but the symbol ⋉ reflects the fact that the group operation is not simply , which is also a group (called the direct product), but does not reflect the way that rigid-body motions work.
The next section reviews the handedness-preserving (Sohncke) crystallographic space groups, which are discrete subgroups of and which form an important component of the motion alphabets developed in this paper.
5. Crystallographic Groups
A crystallographic group is a discrete (and also cocompact) subgroup of the Euclidean group , where is the orthogonal group consisting of all orthogonal real-valued matrices (defined as in (23) for , but also allowing there). In addition to rotations, the group also contains reflections and rotoreflections (improper rotations). If , a crystallographic group is commonly called a space group, for , it is referred to as a wallpaper group. The literature on mathematical crystallography is immense and spans many decades. Some notable introductions include [68–71]. The relationship between torsion-free crystallographic groups (i.e., Bieberbach groups; see below) and flat manifolds has been studied extensively [72–76].
Elements of a crystallographic group can be expressed as pairswhere (a discrete point group, i.e., a subgroup of ), (a lattice in ), and . In particular, is the translational part of a glide-reflection or screw-displacement lattice motion. In general will satisfy the cocycle identities where is the subgroup of pure (or primitive) translations in , which is always normal (). The “” removes components in the sum that are in , in analogy with, for example, in modulo- arithmetic.
If an element is of finite order (i.e., if there exists an such that ), it is called a torsion element. The group is called torsion-free (or a Bieberbach group) if it is free of torsion elements. This is equivalent to the property that no element other than the identity has a fixed point (i.e., a point with ). If in (33), then can be written as the semidirect product and the group is called symmorphic. Of the 230 possible types of space groups, 73 can be decomposed in this way. These are the symmorphic space groups. Bieberbach groups are not symmorphic.
In the theory of crystallographic groups, a well-known isomorphism is due to the above-mentioned fact that the translation subgroup of is normal: Moreover, it can be shown that is not only discrete, but must be finite. In the case the point group will belong to one of 32 discrete crystallographic point groups that constitute the so-called crystal classes.
In the context of space groups, we can distinguish between Bieberbach (i.e., torsion-free) groups as one extreme, and groups which contain only the identity and torsion elements (rotations, reflections, and improper rotations) as the other extreme, with all other space groups lying somewhat “in between.” For a symmorphic space group , a Bieberbach subgroup with minimum index in is the subgroup of primitive translations; for many nonsymmorphic space groups, on the other hand, there is a Bieberbach subgroup with index allowing for a decomposition of as a group product [77, p. 719]where is a proper subgroup of , and thus . If , then (36) is an inner (or internal) semidirect product (). In this case, the decomposition is unique for each . Another useful property is that each of the 65 Sohncke space groups (i.e., handedness-preserving space groups ) can be written as a productwhere and are, respectively, Bieberbach and symmorphic subgroups (see [77, Thm. 3]). If and or is normal, then (37) is an inner semidirect product, and for every there exist unique and such that . If both and are normal, then is decomposed into the inner direct product .
6. A Motion Alphabet Based on a Fine Double-Coset Decomposition
The essence of language is communicating information about the continuous world using a finite number of symbols, characters, or patterns. In the same spirit, in classical AI, a fundamental goal is to convert perceptual information from the continuous world to strings of symbols drawn from a finite alphabet, so that the machinery of finite-state automata reasoning can be applied. Every action of a robot in the physical world can be described as a sequence of paths in . Such paths are continuous mathematical trajectories which can be viewed as “signals”.
A fundamental problem in AI and sensory perception is the so-called “signals to symbols” problem [78–80], in which observations in the continuous world are converted to coarsified representations (i.e., symbols), on which AI systems can execute logical reasoning algorithms. A natural way to discretize the space of paths is to first discretize time, thereby reducing the infinite dimensions of the path space to a finite number of dimensions, and then to replace each sampled pose on the path with a rounded-off version in a discrete set (the alphabet of maneuvers). Because does not have dense subgroups, a clever discretization needs to be constructed.
In the three-dimensional case, one of the finest space groups is which has a total of 24 rotational elements corresponding to the rotational symmetry operations of a cube (cf. [69, p. 634f]). Therefore, if we want to quantize rigid-body motions with a resolution that is reasonable for real-world tasks, something finer than rounding off to the nearest element of is necessary.
Building on ideas introduced in the context of protein-packing models in X-ray crystallography [55, 65, 77, 81, 82], we can augment by an auxiliary discrete rotation group. For example, let denote the group of rotational symmetries of the icosahedron (as a discrete subgroup of ). has 60 elements, and there is no lattice of discrete translations that corresponds to it. In particular, and , the identity element. This means that the double-coset space is a compact Riemannian manifold. It is possible to define a compact fundamental domain using (20). Restricting the quotient map from to to the fundamental domain then gives a bijective (i.e., one-to-one) mapping from to . Moreover, the action of (from the left) and (from the right) gives a way to tile with disjoint shifts of , because (cf. (18))Intensive study of the fundamental domains and has been conducted (ibid.).
When the fundamental domain is constructed using (20), it has the identity element at its center, and so the tiling in (38) has the effect of sampling each center point by moving from to where and . In other words, the product can be used as a quantized version of . The number of rotational elements will be which is sufficiently fine to capture the essence of any frame along a trajectory during a robot task. The alphabet defined by is infinite, but by limiting the extent of translations to be contained in a bounded region, it becomes finite. This means that continuous trajectories can be translated into a finite string of alphabet characters (Figure 4). This opens up the possibility of mapping these quantized trajectories into words expressed in a natural language.
An important advantage of the quantization scheme (38) is that the shifted fundamental domains