Nested Grassmannians for Dimensionality Reduction with Applications

Chun-Hao Yang1, Baba C. Vemuri2
1: National Taiwan University, 2: University of Florida
IPMI 2021 special issue
Publication date: 2022/03/01
PDF · arXiv · Video · Code

Abstract

In the recent past, nested structure of Riemannian manifolds has been studied in the context of dimensionality reduction as an alternative to the popular principal geodesic analysis (PGA) technique, for example, the principal nested spheres. In this paper, we propose a novel framework for constructing a nested sequence of homogeneous Riemannian manifolds. Common examples of homogeneous Riemannian manifolds include the spheres, the Stiefel manifolds, and the Grassmann manifolds. In particular, we focus on applying the proposed framework to the Grassmann manifolds, giving rise to the nested Grassmannians (NG). An important application in which Grassmann manifolds are encountered is planar shape analysis. Specifically, each planar (2D) shape can be represented as a point in the complex projective space which is a complex Grassmann manifold. Some salient features of our framework are: (i) it explicitly exploits the geometry of the homogeneous Riemannain manifolds and (ii) the nested lower-dimensional submanifolds need not be geodesic. With the proposed NG structure, we develop algorithms for the supervised and unsupervised dimensionality reduction problems respectively. The proposed algorithms are compared with PGA via simulation studies and real data experiments and are shown to achieve a higher ratio of expressed variance compared to PGA.
The code is available at https://github.com/cvgmi/NestedGrassmann

Keywords

Grassmann Manifolds · Dimensionality Reduction · Shape Analysis · Homogeneous Riemannian Manifolds

Bibtex @article{melba:2022:002:yang, title = "Nested Grassmannians for Dimensionality Reduction with Applications", author = "Yang, Chun-Hao and Vemuri, Baba C.", journal = "Machine Learning for Biomedical Imaging", volume = "1", issue = "IPMI 2021 special issue", year = "2022", pages = "1--21", issn = "2766-905X", url = "https://melba-journal.org/2022:002" }
RISTY - JOUR AU - Yang, Chun-Hao AU - Vemuri, Baba C. PY - 2022 TI - Nested Grassmannians for Dimensionality Reduction with Applications T2 - Machine Learning for Biomedical Imaging VL - 1 IS - IPMI 2021 special issue SP - 1 EP - 21 SN - 2766-905X UR - https://melba-journal.org/2022:002 ER -

2022:002 cover


1 Introduction

Riemannian manifolds are often used to model the sample space in which features derived from the raw data encountered in many medical imaging applications live. Common examples include the diffusion tensors (DTs) in diffusion tensor imaging (DTI) (Basser et al., 1994), the ensemble average propagator (EAP) (Callaghan, 1993). Both DTs and EAP are used to capture the diffusional properties of water molecules in the central nervous system by non-invasively imaging the tissue via diffusion weighted magnetic resonance imaging. In DTI, diffusion at a voxel is captured by a DT, which is a 3×3333\times 33 × 3 symmetric positive-definite matrix, whereas EAP is a probability distribution characterizing the local diffusion at a voxel, which can be parametrized as a point on the Hilbert sphere. Another example is the shape space used to represent shapes in shape analysis. There are many ways to represent a shape, and the most simple one is to use landmarks. For the landmark-based representation, the shape space is called Kendall’s shape space (Kendall, 1984). Kendall’s shape space is in general a stratified space (Goresky and MacPherson, 1988; Feragen et al., 2014), but for the special case of planar shapes, the shape space is the complex projective space, which is a complex Grassmann manifold. The examples mentioned above are often high-dimensional: a DTI scan usually contains half a million DTs; the shape of the Corpus Callosum (which is used in our experiments) is represented by a several hundreds of boundary points in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, in these cases, dimension reduction techniques, if applied appropriately, can benefit the subsequent statistical analysis.

For data on Riemannian manifolds, the most widely used dimensionality reduction method is the principal geodesic analysis (PGA) (Fletcher et al., 2004), which generalizes the principal component analysis (PCA) to manifold-valued data. In fact, there are many variants of PGA. Fletcher et al. (2004) proposed to find the geodesic submanifold of a certain dimension that maximizes the projected variance and computationally, it was achieved via a linear approximation, i.e., applying PCA on the tangent space at the intrinsic mean. This is sometimes referred to as the tangent PCA. Note that this approximation requires the data to be clustered around the intrinsic mean, otherwise the tangent space approximation to the manifold leads to inaccuracies. Later on, Sommer et al. (2010) proposed the Exact PGA (EPGA), which does not use any linear approximation. However, EPGA is computationally expensive as it requires two non-linear optimizations steps per iteration (projection to the geodesic submanifold and finding the new geodesic direction such that the loss of information is minimized). Chakraborty et al. (2016) partially solved this problem for manifolds with constant sectional curvature (spheres and hyperbolic spaces) by deriving closed form formulae for the projection. Other variants of PGA include but are not limited to sparse exact PGA (Banerjee et al., 2017), geodesic PCA (Huckemann et al., 2010), and probabilistic PGA (Zhang and Fletcher, 2013). All these methods focus on projecting data to a geodesic submanifold as in PCA where one projects data to a vector subspace. Instead, one can also project data to a submanifold that minimizes the reconstruction error without any further restrictions, e.g. being geodesic. This is the generalization of the principal curve (Hastie and Stuetzle, 1989) to Riemannian manifolds presented in Hauberg (2016).

Another feature of PCA is that it produces a sequence of nested vector subspaces. From this observation, Jung et al. (2012) proposed the principal nested spheres (PNS) by embedding an n1𝑛1n-1italic_n - 1-sphere into an n𝑛nitalic_n-sphere, and the embedding is not necessarily isometric. Hence PNS is more general than PGA in that PNS is not required to be geodesic. Similarly, for the manifold Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of n×n𝑛𝑛n\times nitalic_n × italic_n SPD matrices, Harandi et al. (2018) proposed a geometry-aware dimension reduction by projecting data in Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT for some mnmuch-less-than𝑚𝑛m\ll nitalic_m ≪ italic_n. They also applied the nested Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for the supervised dimensionality reduction problem. Damon and Marron (2014) considered a nested sequence of relations which determine a nested sequence of submanifolds that are not necessarily geodesic. They showed various examples, including Euclidean space and the n𝑛nitalic_n-sphere, depicting how the nested relations generalized PCA and PNS. However, for an arbitrary Riemannian manifold, it is not clear how to construct a nested submanifold. Another generalization of PGA was proposed by Pennec et al. (2018), called the exponential barycentric subspace (EBS). A k𝑘kitalic_k-dimensional EBS is defined as the locus of weighted exponential barycenters of (k+1)𝑘1(k+1)( italic_k + 1 ) affinely independent reference points. The EBSs are naturally nested by removing or adding reference points.

Unlike PGA which can be applied to any Riemannian manifolds, the construction of the nested manifolds relies heavily on the geometry of the specific manifold, and there is no general principle for such a construction. All the examples described above (spheres and Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) and several others such as the Grassmannian, Stiefel etc. are homogeneous Riemannian manifolds (Helgason, 1979), which intuitively means that all points on the manifold ’look’ the same. In this work, we propose a general framework for constructing a nested sequence of homogeneous Riemannian manifolds, and, via some simple algebraic computation, show that the nested sphere and the nested Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT can indeed be derived within this framework. We will then apply this framework to the Grassmann manifolds, called the nested Grassmann manifolds (NG). The Grassmann manifold Gr(p,𝕍)Gr𝑝𝕍\text{Gr}(p,\mathbb{V})Gr ( italic_p , blackboard_V ) is the manifold of all p𝑝pitalic_p-dimensional subspaces of the vector space 𝕍𝕍\mathbb{V}blackboard_V where 1pdim𝕍1𝑝dimension𝕍1\leq p\leq\dim\mathbb{V}1 ≤ italic_p ≤ roman_dim blackboard_V. Usually 𝕍=n𝕍superscript𝑛\mathbb{V}=\mathbb{R}^{n}blackboard_V = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT or 𝕍=n𝕍superscript𝑛\mathbb{V}=\mathbb{C}^{n}blackboard_V = blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. An important example is Kendall’s shape space of 2D shapes. The space of all shapes determined by k𝑘kitalic_k landmarks in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is denoted by Σ2ksubscriptsuperscriptΣ𝑘2\Sigma^{k}_{2}roman_Σ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Kendall (1984) showed that it is isomorphic to the complex projective space Pk2Gr(1,k1)superscript𝑃𝑘2Gr1superscript𝑘1\mathbb{C}P^{k-2}\cong\text{Gr}(1,\mathbb{C}^{k-1})blackboard_C italic_P start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ≅ Gr ( 1 , blackboard_C start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ). In many applications, the number k𝑘kitalic_k of landmarks is large, and so is the dimension of Gr(1,k1)Gr1superscript𝑘1\text{Gr}(1,\mathbb{C}^{k-1})Gr ( 1 , blackboard_C start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ). The core of the proposed dimensionality reduction involves projecting data on Gr(p,𝕍)Gr𝑝𝕍\text{Gr}(p,\mathbb{V})Gr ( italic_p , blackboard_V ) to Gr(p,𝕍~)Gr𝑝~𝕍\text{Gr}(p,\widetilde{\mathbb{V}})Gr ( italic_p , over~ start_ARG blackboard_V end_ARG ) with dim𝕍~dim𝕍much-less-thandimension~𝕍dimension𝕍\dim\widetilde{\mathbb{V}}\ll\dim\mathbb{V}roman_dim over~ start_ARG blackboard_V end_ARG ≪ roman_dim blackboard_V. The main contributions of this work are as follows: (i) We propose a general framework for constructing a nested sequence of homogeneous Riemannian manifolds unifying the recently proposed nested spheres (Jung et al., 2012) and nested SPD manifolds (Harandi et al., 2018). (ii) We present novel dimensionality reduction techniques based on the concept of NG in both supervised and unsupervised settings respectively. (iii) We demonstrate via several simulation studies and real data experiments, that the proposed NG can achieve a higher ratio of expressed variance compared to PGA.

The rest of the paper is organized as follows. In Section 2, we briefly review the definition of homogeneous Riemannian manifolds and present the recipe for the construction of nested homogeneous Riemannian manifolds. In Section 3, we first review the geometry of the Grassmannian. By applying the procedure developed in Section 2, we present the nested Grassmann manifolds representation and discuss some of its properties in details. Then we describe algorithms for our unsupervised and supervised dimensionality reduction techniques, called the Principal Nested Grassmanns (PNG), in Section 4. In Section 5, we present several simulation studies and experimental results showing how the PNG technique performs compared to PGA under different settings. Finally, we draw conclusions in Section 6.

2 Nested Homogeneous Spaces

In this section, we introduce the structure of nested homogeneous Riemannian manifolds. A Riemannian manifold (M,τ)𝑀𝜏(M,\tau)( italic_M , italic_τ ) is homogeneous if the group of isometries G=I(M)𝐺𝐼𝑀G=I(M)italic_G = italic_I ( italic_M ) admitted by the manifold acts transitively on M𝑀Mitalic_M (Helgason, 1979), i.e., for x,yM𝑥𝑦𝑀x,y\in Mitalic_x , italic_y ∈ italic_M, there exists gG𝑔𝐺g\in Gitalic_g ∈ italic_G such that g(x)=y𝑔𝑥𝑦g(x)=yitalic_g ( italic_x ) = italic_y. In this case, M𝑀Mitalic_M can be identified with G/H𝐺𝐻G/Hitalic_G / italic_H where H𝐻Hitalic_H is an isotropy subgroup of G𝐺Gitalic_G at some point pM𝑝𝑀p\in Mitalic_p ∈ italic_M, i.e. H={gG:g(p)=p}𝐻conditional-set𝑔𝐺𝑔𝑝𝑝H=\{g\in G:g(p)=p\}italic_H = { italic_g ∈ italic_G : italic_g ( italic_p ) = italic_p }. Examples of homogeneous Riemannian manifolds include but are not limited to, the n𝑛nitalic_n-spheres Sn1=SO(n)/SO(n1)superscript𝑆𝑛1SO𝑛SO𝑛1S^{n-1}=\text{SO}(n)/\text{SO}(n-1)italic_S start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT = SO ( italic_n ) / SO ( italic_n - 1 ), the SPD manifolds Pn=GL(n)/O(n)subscript𝑃𝑛GL𝑛O𝑛P_{n}=\text{GL}(n)/\text{O}(n)italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = GL ( italic_n ) / O ( italic_n ), the Stiefel manifolds St(m,n)=SO(n)/SO(nm)St𝑚𝑛SO𝑛SO𝑛𝑚\text{St}(m,n)=\text{SO}(n)/\text{SO}(n-m)St ( italic_m , italic_n ) = SO ( italic_n ) / SO ( italic_n - italic_m ), and the Grassmann manifolds Gr(p,n)=SO(n)/S(O(p)×O(np))Gr𝑝𝑛SO𝑛𝑆O𝑝O𝑛𝑝\text{Gr}(p,n)=\text{SO}(n)/S(\text{O}(p)\times\text{O}(n-p))Gr ( italic_p , italic_n ) = SO ( italic_n ) / italic_S ( O ( italic_p ) × O ( italic_n - italic_p ) ).

In this paper, we focus on the case where G𝐺Gitalic_G is either a real or a complex matrix Lie group, i.e. G𝐺Gitalic_G is a subgroup of GL(n,)GL𝑛\text{GL}(n,\mathbb{R})GL ( italic_n , blackboard_R ) or GL(n,)GL𝑛\text{GL}(n,\mathbb{C})GL ( italic_n , blackboard_C ). The main idea behind the construction of nested homogeneous spaces is simple: augmenting the matrix in G𝐺Gitalic_G in an appropriate way. With an embedding of the isometry group G𝐺Gitalic_G, the embedding of the homogeneous space G/H𝐺𝐻G/Hitalic_G / italic_H follows naturally from the quotient structure.

Let G𝐺Gitalic_G and G~~𝐺\tilde{G}over~ start_ARG italic_G end_ARG be two connected Lie groups such that dimG<dimG~dimension𝐺dimension~𝐺\dim G<\dim\tilde{G}roman_dim italic_G < roman_dim over~ start_ARG italic_G end_ARG and ι~:GG~:~𝜄𝐺~𝐺\tilde{\iota}:G\to\tilde{G}over~ start_ARG italic_ι end_ARG : italic_G → over~ start_ARG italic_G end_ARG be an embedding. For a closed connected subgroup H𝐻Hitalic_H of G𝐺Gitalic_G, let H~=ι~(H)~𝐻~𝜄𝐻\tilde{H}=\tilde{\iota}(H)over~ start_ARG italic_H end_ARG = over~ start_ARG italic_ι end_ARG ( italic_H ). Since ι~~𝜄\tilde{\iota}over~ start_ARG italic_ι end_ARG is an embedding, H~~𝐻\tilde{H}over~ start_ARG italic_H end_ARG is also a closed subgroup of G~~𝐺\tilde{G}over~ start_ARG italic_G end_ARG. Now the canonical embedding of G/H𝐺𝐻G/Hitalic_G / italic_H in G~/H~~𝐺~𝐻\tilde{G}/\tilde{H}over~ start_ARG italic_G end_ARG / over~ start_ARG italic_H end_ARG is defined by ι(gH)=ι~(g)H~𝜄𝑔𝐻~𝜄𝑔~𝐻\iota(gH)=\tilde{\iota}(g)\tilde{H}italic_ι ( italic_g italic_H ) = over~ start_ARG italic_ι end_ARG ( italic_g ) over~ start_ARG italic_H end_ARG for gG𝑔𝐺g\in Gitalic_g ∈ italic_G. It is easy to see that ι𝜄\iotaitalic_ι is well-defined. Let g1,g2Gsubscript𝑔1subscript𝑔2𝐺g_{1},g_{2}\in Gitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_G be such that g1=g2hsubscript𝑔1subscript𝑔2g_{1}=g_{2}hitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_h for some hH𝐻h\in Hitalic_h ∈ italic_H. Then

ι(g1H)=ι~(g1)H~=ι~(g2h)H~=ι~(g2)ι~(h)H~=ι~(g2)H~=ι(g2H).𝜄subscript𝑔1𝐻~𝜄subscript𝑔1~𝐻~𝜄subscript𝑔2~𝐻~𝜄subscript𝑔2~𝜄~𝐻~𝜄subscript𝑔2~𝐻𝜄subscript𝑔2𝐻\iota(g_{1}H)=\tilde{\iota}(g_{1})\tilde{H}=\tilde{\iota}(g_{2}h)\tilde{H}=% \tilde{\iota}(g_{2})\tilde{\iota}(h)\tilde{H}=\tilde{\iota}(g_{2})\tilde{H}=% \iota(g_{2}H).italic_ι ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_H ) = over~ start_ARG italic_ι end_ARG ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) over~ start_ARG italic_H end_ARG = over~ start_ARG italic_ι end_ARG ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_h ) over~ start_ARG italic_H end_ARG = over~ start_ARG italic_ι end_ARG ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) over~ start_ARG italic_ι end_ARG ( italic_h ) over~ start_ARG italic_H end_ARG = over~ start_ARG italic_ι end_ARG ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) over~ start_ARG italic_H end_ARG = italic_ι ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_H ) .

Now for the homogeneous Riemannian manifolds (M=G/H,τ1)𝑀𝐺𝐻subscript𝜏1(M=G/H,\tau_{1})( italic_M = italic_G / italic_H , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and (M~=G~/H~,τ2)~𝑀~𝐺~𝐻subscript𝜏2(\tilde{M}=\tilde{G}/\tilde{H},\tau_{2})( over~ start_ARG italic_M end_ARG = over~ start_ARG italic_G end_ARG / over~ start_ARG italic_H end_ARG , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), denote the left-G𝐺Gitalic_G-invariant, right-H𝐻Hitalic_H-invariant metric on G𝐺Gitalic_G (resp. left-G~~𝐺\tilde{G}over~ start_ARG italic_G end_ARG-invariant, right-H~~𝐻\tilde{H}over~ start_ARG italic_H end_ARG-invariant metric on G~~𝐺\tilde{G}over~ start_ARG italic_G end_ARG) by τ¯1subscript¯𝜏1\bar{\tau}_{1}over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ¯2subscript¯𝜏2\bar{\tau}_{2}over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, respectively (see Cheeger and Ebin (1975, Prop. 3.16(4))).

Proposition 1

If ι~:GG~normal-:normal-~𝜄normal-→𝐺normal-~𝐺\tilde{\iota}:G\to\tilde{G}over~ start_ARG italic_ι end_ARG : italic_G → over~ start_ARG italic_G end_ARG is isometric, then so is ι:G/HG~/H~normal-:𝜄normal-→𝐺𝐻normal-~𝐺normal-~𝐻\iota:G/H\to\tilde{G}/\tilde{H}italic_ι : italic_G / italic_H → over~ start_ARG italic_G end_ARG / over~ start_ARG italic_H end_ARG.

Proof  Denote the Riemannian submersions by π:GG/H:𝜋𝐺𝐺𝐻\pi:G\to G/Hitalic_π : italic_G → italic_G / italic_H and π~:G~G~/H~:~𝜋~𝐺~𝐺~𝐻\tilde{\pi}:\tilde{G}\to\tilde{G}/\tilde{H}over~ start_ARG italic_π end_ARG : over~ start_ARG italic_G end_ARG → over~ start_ARG italic_G end_ARG / over~ start_ARG italic_H end_ARG. Let X𝑋Xitalic_X and Y𝑌Yitalic_Y be vector fields on G/H𝐺𝐻G/Hitalic_G / italic_H and X¯¯𝑋\bar{X}over¯ start_ARG italic_X end_ARG and Y¯¯𝑌\bar{Y}over¯ start_ARG italic_Y end_ARG be their horizontal lifts respectively, i.e. dπ(X¯)=X𝑑𝜋¯𝑋𝑋d\pi(\bar{X})=Xitalic_d italic_π ( over¯ start_ARG italic_X end_ARG ) = italic_X and dπ(Y¯)=Y𝑑𝜋¯𝑌𝑌d\pi(\bar{Y})=Yitalic_d italic_π ( over¯ start_ARG italic_Y end_ARG ) = italic_Y. By the definition of Riemannian submersions, dπ𝑑𝜋d\piitalic_d italic_π is isometric on the horizontal spaces, i.e. τ¯1(X¯,Y¯)=τ1(dπ(X¯),dπ(Y¯))=τ1(X,Y)subscript¯𝜏1¯𝑋¯𝑌subscript𝜏1𝑑𝜋¯𝑋𝑑𝜋¯𝑌subscript𝜏1𝑋𝑌\bar{\tau}_{1}(\bar{X},\bar{Y})=\tau_{1}(d\pi(\bar{X}),d\pi(\bar{Y}))=\tau_{1}% (X,Y)over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG ) = italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_d italic_π ( over¯ start_ARG italic_X end_ARG ) , italic_d italic_π ( over¯ start_ARG italic_Y end_ARG ) ) = italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ). Since ι~~𝜄\tilde{\iota}over~ start_ARG italic_ι end_ARG is isometric, we have τ¯1(X¯,Y¯)=τ¯2(dι~(X¯),dι~(Y¯))subscript¯𝜏1¯𝑋¯𝑌subscript¯𝜏2𝑑~𝜄¯𝑋𝑑~𝜄¯𝑌\bar{\tau}_{1}(\bar{X},\bar{Y})=\bar{\tau}_{2}(d\tilde{\iota}(\bar{X}),d\tilde% {\iota}(\bar{Y}))over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG ) = over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d over~ start_ARG italic_ι end_ARG ( over¯ start_ARG italic_X end_ARG ) , italic_d over~ start_ARG italic_ι end_ARG ( over¯ start_ARG italic_Y end_ARG ) ). By the definition of ι𝜄\iotaitalic_ι, we also have ιπ=π~ι~𝜄𝜋~𝜋~𝜄\iota\circ\pi=\tilde{\pi}\circ\tilde{\iota}italic_ι ∘ italic_π = over~ start_ARG italic_π end_ARG ∘ over~ start_ARG italic_ι end_ARG, which implies dιdπ=dπ~dι~𝑑𝜄𝑑𝜋𝑑~𝜋𝑑~𝜄d\iota\circ d\pi=d\tilde{\pi}\circ d\tilde{\iota}italic_d italic_ι ∘ italic_d italic_π = italic_d over~ start_ARG italic_π end_ARG ∘ italic_d over~ start_ARG italic_ι end_ARG. Hence,

τ1(X,Y)subscript𝜏1𝑋𝑌\displaystyle\tau_{1}(X,Y)italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y )=τ¯1(X¯,Y¯)=τ¯2(dι~(X¯),dι~(Y¯))absentsubscript¯𝜏1¯𝑋¯𝑌subscript¯𝜏2𝑑~𝜄¯𝑋𝑑~𝜄¯𝑌\displaystyle=\bar{\tau}_{1}(\bar{X},\bar{Y})=\bar{\tau}_{2}(d\tilde{\iota}(% \bar{X}),d\tilde{\iota}(\bar{Y}))= over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG ) = over¯ start_ARG italic_τ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d over~ start_ARG italic_ι end_ARG ( over¯ start_ARG italic_X end_ARG ) , italic_d over~ start_ARG italic_ι end_ARG ( over¯ start_ARG italic_Y end_ARG ) )
=τ2((dπ~dι~)(X¯),(dπ~dι~)(Y¯))absentsubscript𝜏2𝑑~𝜋𝑑~𝜄¯𝑋𝑑~𝜋𝑑~𝜄¯𝑌\displaystyle=\tau_{2}((d\tilde{\pi}\circ d\tilde{\iota})(\bar{X}),(d\tilde{% \pi}\circ d\tilde{\iota})(\bar{Y}))= italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ( italic_d over~ start_ARG italic_π end_ARG ∘ italic_d over~ start_ARG italic_ι end_ARG ) ( over¯ start_ARG italic_X end_ARG ) , ( italic_d over~ start_ARG italic_π end_ARG ∘ italic_d over~ start_ARG italic_ι end_ARG ) ( over¯ start_ARG italic_Y end_ARG ) )
=τ2((dιdπ)(X¯),(dιdπ)(Y¯))absentsubscript𝜏2𝑑𝜄𝑑𝜋¯𝑋𝑑𝜄𝑑𝜋¯𝑌\displaystyle=\tau_{2}((d\iota\circ d\pi)(\bar{X}),(d\iota\circ d\pi)(\bar{Y}))= italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ( italic_d italic_ι ∘ italic_d italic_π ) ( over¯ start_ARG italic_X end_ARG ) , ( italic_d italic_ι ∘ italic_d italic_π ) ( over¯ start_ARG italic_Y end_ARG ) )
=τ2(dι(X),dι(Y))absentsubscript𝜏2𝑑𝜄𝑋𝑑𝜄𝑌\displaystyle=\tau_{2}(d\iota(X),d\iota(Y))= italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d italic_ι ( italic_X ) , italic_d italic_ι ( italic_Y ) )

where the third equality follows from the isometry of dπ~𝑑~𝜋d\tilde{\pi}italic_d over~ start_ARG italic_π end_ARG.  

Proposition 1 simply says that if the isometry group is isometrically embedded, then the associated homogeneous Riemannian manifolds will also be isometrically embedded. Conversely, if we have a Riemannian submersion f~:G~G:~𝑓~𝐺𝐺\tilde{f}:\tilde{G}\to Gover~ start_ARG italic_f end_ARG : over~ start_ARG italic_G end_ARG → italic_G, it can easily be shown that the induced map f:G~/H~G/H:𝑓~𝐺~𝐻𝐺𝐻f:\tilde{G}/\tilde{H}\to G/Hitalic_f : over~ start_ARG italic_G end_ARG / over~ start_ARG italic_H end_ARG → italic_G / italic_H would also be a Riemannian submersion where H=f~(H~)𝐻~𝑓~𝐻H=\tilde{f}(\tilde{H})italic_H = over~ start_ARG italic_f end_ARG ( over~ start_ARG italic_H end_ARG ). The construction above can be applied to a sequence of homogeneous spaces {Mm}m=1superscriptsubscriptsubscript𝑀𝑚𝑚1\{M_{m}\}_{m=1}^{\infty}{ italic_M start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT, i.e. the embedding ιm:MmMm+1:subscript𝜄𝑚subscript𝑀𝑚subscript𝑀𝑚1\iota_{m}:M_{m}\to M_{m+1}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_M start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT → italic_M start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT can be induced from the embedding of the isometry groups ι~m:GmGm+1:subscript~𝜄𝑚subscript𝐺𝑚subscript𝐺𝑚1\tilde{\iota}_{m}:G_{m}\to G_{m+1}over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT → italic_G start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT where Gm=I(Mm)subscript𝐺𝑚𝐼subscript𝑀𝑚G_{m}=I(M_{m})italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_I ( italic_M start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) provided dimGi<dimGjdimensionsubscript𝐺𝑖dimensionsubscript𝐺𝑗\dim G_{i}<\dim G_{j}roman_dim italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < roman_dim italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for i<j𝑖𝑗i<jitalic_i < italic_j. See Figure 1 for the structure of nested homogeneous spaces.

{tikzcd}

[row sep=large, font=] Gmsubscript𝐺𝑚G_{m}italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT [hookrightarrow]rι~~𝜄\tilde{\iota}over~ start_ARG italic_ι end_ARG [rightarrow]dπ𝜋\piitalic_π & Gm+1subscript𝐺𝑚1G_{m+1}italic_G start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT [rightarrow]dπ𝜋\piitalic_π
Gm/Hmsubscript𝐺𝑚subscript𝐻𝑚G_{m}/H_{m}italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT [leftrightarrow]df𝑓fitalic_fGm+1/Hm+1subscript𝐺𝑚1subscript𝐻𝑚1G_{m+1}/H_{m+1}italic_G start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT / italic_H start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT[leftrightarrow]df𝑓fitalic_f
Mmsubscript𝑀𝑚M_{m}italic_M start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT [hookrightarrow]rι𝜄\iotaitalic_ιMm+1subscript𝑀𝑚1M_{m+1}italic_M start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT

Figure 1: Commutative diagram of the induced embedding for homogeneous spaces.

3 Nested Grassmann Manifolds

In this section, we will apply the theory of nested homogeneous space from the previous section to the Grassmann manifolds. First, we briefly review the geometry of the Grassmann manifolds in Section 3.1. With the theory in Section 2, we derive the nested Grassmann manifolds in Section 3.2, and the derivation for nested spheres and nested SPD manifolds are carried out in Section 3.3.

3.1 The Riemannian Geometry of Grassmann Manifolds

To simplify the notation, we assume 𝕍=n𝕍superscript𝑛\mathbb{V}=\mathbb{R}^{n}blackboard_V = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and write Gr(p,n)Gr(p,n)Gr𝑝𝑛Gr𝑝superscript𝑛\text{Gr}(p,n)\coloneqq\text{Gr}(p,\mathbb{R}^{n})Gr ( italic_p , italic_n ) ≔ Gr ( italic_p , blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ). All the results presented in this section can be easily extended to the case of 𝕍=n𝕍superscript𝑛\mathbb{V}=\mathbb{C}^{n}blackboard_V = blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT by replacing transposition with conjugate transposition and orthogonal groups with unitary groups. The Grassmann manifold Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) is the manifold of all p𝑝pitalic_p-dimensional subspaces of nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and for a subspace 𝒳Gr(p,n)𝒳Gr𝑝𝑛\mathcal{X}\in\text{Gr}(p,n)caligraphic_X ∈ Gr ( italic_p , italic_n ), we write 𝒳=span(X)𝒳span𝑋\mathcal{X}=\operatorname{span}(X)caligraphic_X = roman_span ( italic_X ) where the columns of X𝑋Xitalic_X form an orthonormal basis for 𝒳𝒳\mathcal{X}caligraphic_X. The space of all n×p𝑛𝑝n\times pitalic_n × italic_p matrices X𝑋Xitalic_X such that XTX=Ipsuperscript𝑋𝑇𝑋subscript𝐼𝑝X^{T}X=I_{p}italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X = italic_I start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is called the Stiefel manifold, denoted by St(p,n)St𝑝𝑛\text{St}(p,n)St ( italic_p , italic_n ). Special cases of Stiefel manifolds are the Lie group of all orthogonal matrices, O(n)=St(n,n)O𝑛St𝑛𝑛\text{O}(n)=\text{St}(n,n)O ( italic_n ) = St ( italic_n , italic_n ), and the n𝑛nitalic_n-sphere, Sn1=St(1,n)superscript𝑆𝑛1St1𝑛S^{n-1}=\text{St}(1,n)italic_S start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT = St ( 1 , italic_n ). The Stiefel manifold with the induced Euclidean metric (i.e. for U,VTXSt(p,n)𝑈𝑉subscript𝑇𝑋St𝑝𝑛U,V\in T_{X}\text{St}(p,n)italic_U , italic_V ∈ italic_T start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT St ( italic_p , italic_n ), U,VX=tr(UTV)subscript𝑈𝑉𝑋trsuperscript𝑈𝑇𝑉\langle U,V\rangle_{X}=\operatorname{tr}(U^{T}V)⟨ italic_U , italic_V ⟩ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = roman_tr ( italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_V )) is a homogeneous Riemannian manifold, St(p,n)=SO(n)/SO(np)St𝑝𝑛SO𝑛SO𝑛𝑝\text{St}(p,n)=\text{SO}(n)/\text{SO}(n-p)St ( italic_p , italic_n ) = SO ( italic_n ) / SO ( italic_n - italic_p ). A canonical Riemannian metric on the Grassmann manifold can be inherited from the metric on St(p,n)St𝑝𝑛\text{St}(p,n)St ( italic_p , italic_n ) since it is invariant to the left multiplication by elements of O(n)𝑂𝑛O(n)italic_O ( italic_n ) (Absil et al., 2004; Edelman et al., 1998). The Grassmann manifold with this metric is also homogeneous, Gr(p,n)=SO(n)/S(O(p)×O(np))Gr𝑝𝑛SO𝑛𝑆O𝑝O𝑛𝑝\text{Gr}(p,n)=\text{SO}(n)/S(\text{O}(p)\times\text{O}(n-p))Gr ( italic_p , italic_n ) = SO ( italic_n ) / italic_S ( O ( italic_p ) × O ( italic_n - italic_p ) ).

With this canonical metric on the Grassmann manifolds, the geodesic can be expressed in closed form. Let 𝒳=span(X)Gr(p,n)𝒳span𝑋Gr𝑝𝑛\mathcal{X}=\operatorname{span}(X)\in\text{Gr}(p,n)caligraphic_X = roman_span ( italic_X ) ∈ Gr ( italic_p , italic_n ) where XSt(p,n)𝑋St𝑝𝑛X\in\text{St}(p,n)italic_X ∈ St ( italic_p , italic_n ) and H𝐻Hitalic_H be an n×p𝑛𝑝n\times pitalic_n × italic_p matrix. Then the geodesic γ(t)𝛾𝑡\gamma(t)italic_γ ( italic_t ) with γ(0)=𝒳𝛾0𝒳\gamma(0)=\mathcal{X}italic_γ ( 0 ) = caligraphic_X and γ(0)=Hsuperscript𝛾0𝐻\gamma^{\prime}(0)=Hitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) = italic_H is given by γ𝒳,H(t)=span(XVcosΣt+UsinΣt)subscript𝛾𝒳𝐻𝑡span𝑋𝑉Σ𝑡𝑈Σ𝑡\gamma_{\mathcal{X},H}(t)=\operatorname{span}(XV\cos\Sigma t+U\sin\Sigma t)italic_γ start_POSTSUBSCRIPT caligraphic_X , italic_H end_POSTSUBSCRIPT ( italic_t ) = roman_span ( italic_X italic_V roman_cos roman_Σ italic_t + italic_U roman_sin roman_Σ italic_t ) where H=UΣVT𝐻𝑈Σsuperscript𝑉𝑇H=U\Sigma V^{T}italic_H = italic_U roman_Σ italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the compact singular value decomposition (Edelman et al., 1998, Theorem 2.3). The exponential map at 𝒳𝒳\mathcal{X}caligraphic_X is a map from T𝒳Gr(p,n)subscript𝑇𝒳Gr𝑝𝑛T_{\mathcal{X}}\text{Gr}(p,n)italic_T start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT Gr ( italic_p , italic_n ) to Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) defined by Exp𝒳H=γ𝒳,H(1)=span(XVcosΣ+UsinΣ)subscriptExp𝒳𝐻subscript𝛾𝒳𝐻1span𝑋𝑉Σ𝑈Σ\text{Exp}_{\mathcal{X}}H=\gamma_{\mathcal{X},H}(1)=\operatorname{span}(XV\cos% \Sigma+U\sin\Sigma)Exp start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_H = italic_γ start_POSTSUBSCRIPT caligraphic_X , italic_H end_POSTSUBSCRIPT ( 1 ) = roman_span ( italic_X italic_V roman_cos roman_Σ + italic_U roman_sin roman_Σ ). If XTYsuperscript𝑋𝑇𝑌X^{T}Yitalic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y is invertible, the geodesic distance between 𝒳=span(X)𝒳span𝑋\mathcal{X}=\operatorname{span}(X)caligraphic_X = roman_span ( italic_X ) and 𝒴=span(Y)𝒴span𝑌\mathcal{Y}=\operatorname{span}(Y)caligraphic_Y = roman_span ( italic_Y ) is given by dg2(𝒳,𝒴)=trΘ2=i=1pθi2superscriptsubscript𝑑𝑔2𝒳𝒴trsuperscriptΘ2superscriptsubscript𝑖1𝑝superscriptsubscript𝜃𝑖2d_{g}^{2}(\mathcal{X},\mathcal{Y})=\operatorname{tr}\Theta^{2}=\sum_{i=1}^{p}% \theta_{i}^{2}italic_d start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X , caligraphic_Y ) = roman_tr roman_Θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT where (IXXT)Y(XTY)1=UΣVT𝐼𝑋superscript𝑋𝑇𝑌superscriptsuperscript𝑋𝑇𝑌1𝑈Σsuperscript𝑉𝑇(I-XX^{T})Y(X^{T}Y)^{-1}=U\Sigma V^{T}( italic_I - italic_X italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_Y ( italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_U roman_Σ italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, USt(p,n)𝑈St𝑝𝑛U\in\text{St}(p,n)italic_U ∈ St ( italic_p , italic_n ), VO(p)𝑉O𝑝V\in\text{O}(p)italic_V ∈ O ( italic_p ), and Θ=tan1ΣΘsuperscript1Σ\Theta=\tan^{-1}\Sigmaroman_Θ = roman_tan start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ. The diagonal entries θ1,,θksubscript𝜃1subscript𝜃𝑘\theta_{1},\ldots,\theta_{k}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of ΘΘ\Thetaroman_Θ are known as the principal angles between 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y.

Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n )Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m )ptπA,Bsubscript𝜋𝐴𝐵\pi_{A,B}italic_π start_POSTSUBSCRIPT italic_A , italic_B end_POSTSUBSCRIPTptιA,Bsubscript𝜄𝐴𝐵\iota_{A,B}italic_ι start_POSTSUBSCRIPT italic_A , italic_B end_POSTSUBSCRIPTpt𝒳=span(X)𝒳span𝑋\mathcal{X}=\operatorname{span}(X)caligraphic_X = roman_span ( italic_X )pt𝒳^=span(AATX+B)^𝒳span𝐴superscript𝐴𝑇𝑋𝐵\hat{\mathcal{X}}=\operatorname{span}(AA^{T}X+B)over^ start_ARG caligraphic_X end_ARG = roman_span ( italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X + italic_B )ptspan(ATX)spansuperscript𝐴𝑇𝑋\operatorname{span}(A^{T}X)roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X )ptspan(ATX)spansuperscript𝐴𝑇𝑋\operatorname{span}(A^{T}X)roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X )
Figure 2: Illustration of the embedding of Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) in Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) parametrized by ASt(m,n)𝐴St𝑚𝑛A\in\text{St}(m,n)italic_A ∈ St ( italic_m , italic_n ) and Bn×p𝐵superscript𝑛𝑝B\in\mathbb{R}^{n\times p}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT such that ATB=0superscript𝐴𝑇𝐵0A^{T}B=0italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B = 0.

3.2 Embedding of Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) in Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n )

Let 𝒳=span(X)Gr(p,m)𝒳span𝑋Gr𝑝𝑚\mathcal{X}=\operatorname{span}(X)\in\text{Gr}(p,m)caligraphic_X = roman_span ( italic_X ) ∈ Gr ( italic_p , italic_m ), XSt(p,m)𝑋St𝑝𝑚X\in\text{St}(p,m)italic_X ∈ St ( italic_p , italic_m ). The map ι:Gr(p,m)Gr(p,n):𝜄Gr𝑝𝑚Gr𝑝𝑛\iota:\text{Gr}(p,m)\to\text{Gr}(p,n)italic_ι : Gr ( italic_p , italic_m ) → Gr ( italic_p , italic_n ), for m<n𝑚𝑛m<nitalic_m < italic_n, defined by

ι(𝒳)=span([X0(nm)×p])𝜄𝒳spandelimited-[]𝑋subscript0𝑛𝑚𝑝\iota(\mathcal{X})=\operatorname{span}\left(\left[\begin{array}[]{c}X\\ 0_{(n-m)\times p}\end{array}\right]\right)italic_ι ( caligraphic_X ) = roman_span ( [ start_ARRAY start_ROW start_CELL italic_X end_CELL end_ROW start_ROW start_CELL 0 start_POSTSUBSCRIPT ( italic_n - italic_m ) × italic_p end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] )

is an embedding and it is easy to check that this embedding is isometric (Ye and Lim, 2016, Eq. (8)). However, for the dimensionality reduction problem, the above embedding is insufficient as it is not flexible enough to encompass other possible embeddings. To design flexible embeddings, we apply the framework proposed in Section 2. We consider Mm=Gr(p,m)subscript𝑀𝑚Gr𝑝𝑚M_{m}=\text{Gr}(p,m)italic_M start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = Gr ( italic_p , italic_m ) for which the isometry groups are Gm=SO(m)subscript𝐺𝑚SO𝑚G_{m}=\text{SO}(m)italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = SO ( italic_m ) and Hm=S(O(p)×O(mp))subscript𝐻𝑚SO𝑝O𝑚𝑝H_{m}=\text{S}(\text{O}(p)\times\text{O}(m-p))italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = S ( O ( italic_p ) × O ( italic_m - italic_p ) ).

In this paper, we consider the embedding ι~m:SO(m)SO(m+1):subscript~𝜄𝑚SO𝑚SO𝑚1\tilde{\iota}_{m}:\text{SO}(m)\to\text{SO}(m+1)over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : SO ( italic_m ) → SO ( italic_m + 1 ) given by,

ι~m(O)=GS(R[OabTc])subscript~𝜄𝑚𝑂GS𝑅delimited-[]𝑂𝑎superscript𝑏𝑇𝑐\displaystyle\tilde{\iota}_{m}(O)=\text{GS}\left(R\left[\begin{array}[]{cc}O&a% \\ b^{T}&c\end{array}\right]\right)over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_O ) = GS ( italic_R [ start_ARRAY start_ROW start_CELL italic_O end_CELL start_CELL italic_a end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_c end_CELL end_ROW end_ARRAY ] )(3)

where OSO(m)𝑂SO𝑚O\in\text{SO}(m)italic_O ∈ SO ( italic_m ), RSO(m+1)𝑅SO𝑚1R\in\text{SO}(m+1)italic_R ∈ SO ( italic_m + 1 ), a,bm𝑎𝑏superscript𝑚a,b\in\mathbb{R}^{m}italic_a , italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, c𝑐c\in\mathbb{R}italic_c ∈ blackboard_R, cbTO1a𝑐superscript𝑏𝑇superscript𝑂1𝑎c\neq b^{T}O^{-1}aitalic_c ≠ italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_a, and GS()GS\text{GS}(\cdot)GS ( ⋅ ) is the Gram-Schmidt process. Since the Riemannian submersion π:SO(m)Gr(p,m):𝜋SO𝑚Gr𝑝𝑚\pi:\text{SO}(m)\to\text{Gr}(p,m)italic_π : SO ( italic_m ) → Gr ( italic_p , italic_m ) is defined by π(O)=span(Op)𝜋𝑂spansubscript𝑂𝑝\pi(O)=\operatorname{span}(O_{p})italic_π ( italic_O ) = roman_span ( italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) where OSO(m)𝑂SO𝑚O\in\text{SO}(m)italic_O ∈ SO ( italic_m ) and Opsubscript𝑂𝑝O_{p}italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the m×p𝑚𝑝m\times pitalic_m × italic_p matrix containing the first p𝑝pitalic_p columns of O𝑂Oitalic_O, the induced embedding ιm:Gr(p,m)Gr(p,m+1):subscript𝜄𝑚Gr𝑝𝑚Gr𝑝𝑚1\iota_{m}:\text{Gr}(p,m)\to\text{Gr}(p,m+1)italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : Gr ( italic_p , italic_m ) → Gr ( italic_p , italic_m + 1 ) is given by,

ιm(𝒳)=span(R[XbT])=span(R~X+vbT),subscript𝜄𝑚𝒳span𝑅delimited-[]𝑋superscript𝑏𝑇span~𝑅𝑋𝑣superscript𝑏𝑇\iota_{m}(\mathcal{X})=\operatorname{span}\left(R\left[\begin{array}[]{c}X\\ b^{T}\end{array}\right]\right)=\operatorname{span}(\tilde{R}X+vb^{T}),italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( italic_R [ start_ARRAY start_ROW start_CELL italic_X end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ) = roman_span ( over~ start_ARG italic_R end_ARG italic_X + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ,

where bp𝑏superscript𝑝b\in\mathbb{R}^{p}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, RSO(m+1)𝑅SO𝑚1R\in\text{SO}(m+1)italic_R ∈ SO ( italic_m + 1 ), R~~𝑅\tilde{R}over~ start_ARG italic_R end_ARG contains the first m𝑚mitalic_m columns of R𝑅Ritalic_R (which means R~St(m,m+1)~𝑅St𝑚𝑚1\tilde{R}\in\text{St}(m,m+1)over~ start_ARG italic_R end_ARG ∈ St ( italic_m , italic_m + 1 )), v𝑣vitalic_v is the last column of R𝑅Ritalic_R, and 𝒳=span(X)Gr(p,m)𝒳span𝑋Gr𝑝𝑚\mathcal{X}=\operatorname{span}(X)\in\text{Gr}(p,m)caligraphic_X = roman_span ( italic_X ) ∈ Gr ( italic_p , italic_m ). It is easy to see that for R=I𝑅𝐼R=Iitalic_R = italic_I and b=0𝑏0b=0italic_b = 0, this gives the natural embedding described in Ye and Lim (2016) and at the beginning of this section.

Proposition 2

If b=0𝑏0b=0italic_b = 0, then ιmsubscript𝜄𝑚\iota_{m}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is an isometric embedding.

Proof  With Proposition 1, it suffices to show that ι~msubscript~𝜄𝑚\tilde{\iota}_{m}over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is isometric when b=0𝑏0b=0italic_b = 0. Note that as ιmsubscript𝜄𝑚\iota_{m}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is independent of a𝑎aitalic_a and c𝑐citalic_c in the definition of ι~msubscript~𝜄𝑚\tilde{\iota}_{m}over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we can assume a=0𝑎0a=0italic_a = 0 and c=1𝑐1c=1italic_c = 1 without loss of generality. If b=0𝑏0b=0italic_b = 0, ι~msubscript~𝜄𝑚\tilde{\iota}_{m}over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT simplifies to

ι~m(O)=R[O001]subscript~𝜄𝑚𝑂𝑅delimited-[]𝑂001\tilde{\iota}_{m}(O)=R\left[\begin{array}[]{cc}O&0\\ 0&1\end{array}\right]over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_O ) = italic_R [ start_ARRAY start_ROW start_CELL italic_O end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARRAY ]

where RSO(m+1)𝑅SO𝑚1R\in\text{SO}(m+1)italic_R ∈ SO ( italic_m + 1 ). The Riemannian distance on SO(n)SO𝑛\text{SO}(n)SO ( italic_n ) given the induced Euclidean metric is dSO(O1,O2)=12logO1TO2Fsubscript𝑑SOsubscript𝑂1subscript𝑂212subscriptnormsuperscriptsubscript𝑂1𝑇subscript𝑂2𝐹d_{\text{SO}}(O_{1},O_{2})=\frac{1}{\sqrt{2}}\|\log O_{1}^{T}O_{2}\|_{F}italic_d start_POSTSUBSCRIPT SO end_POSTSUBSCRIPT ( italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ∥ roman_log italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. Then for O1,O2SO(m)subscript𝑂1subscript𝑂2SO𝑚O_{1},O_{2}\in\text{SO}(m)italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ SO ( italic_m ),

dSO(ι~m(O1),ι~m(O2))=12log([O1TO2001])F=dSO(O1,O2).subscript𝑑SOsubscript~𝜄𝑚subscript𝑂1subscript~𝜄𝑚subscript𝑂212subscriptnormdelimited-[]superscriptsubscript𝑂1𝑇subscript𝑂2001𝐹subscript𝑑SOsubscript𝑂1subscript𝑂2d_{\text{SO}}(\tilde{\iota}_{m}(O_{1}),\tilde{\iota}_{m}(O_{2}))=\frac{1}{% \sqrt{2}}\left\|\log\left(\left[\begin{array}[]{cc}O_{1}^{T}O_{2}&0\\ 0&1\end{array}\right]\right)\right\|_{F}=d_{\text{SO}}(O_{1},O_{2}).italic_d start_POSTSUBSCRIPT SO end_POSTSUBSCRIPT ( over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ∥ roman_log ( [ start_ARRAY start_ROW start_CELL italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARRAY ] ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT SO end_POSTSUBSCRIPT ( italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) .

Therefore ι~msubscript~𝜄𝑚\tilde{\iota}_{m}over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is an isometric embedding, and so is ιmsubscript𝜄𝑚\iota_{m}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT by Proposition 1.  

With the embedding ιmsubscript𝜄𝑚\iota_{m}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we can construct the corresponding projection πm:Gr(p,m+1)Gr(p,m):subscript𝜋𝑚Gr𝑝𝑚1Gr𝑝𝑚\pi_{m}:\text{Gr}(p,m+1)\to\text{Gr}(p,m)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : Gr ( italic_p , italic_m + 1 ) → Gr ( italic_p , italic_m ) using the following proposition.

Proposition 3

The projection πm:𝐺𝑟(p,m+1)𝐺𝑟(p,m)normal-:subscript𝜋𝑚normal-→𝐺𝑟𝑝𝑚1𝐺𝑟𝑝𝑚\pi_{m}:\text{Gr}(p,m+1)\to\text{Gr}(p,m)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : Gr ( italic_p , italic_m + 1 ) → Gr ( italic_p , italic_m ) corresponding to ιm(𝒳)=span(R~X+vbT)subscript𝜄𝑚𝒳normal-spannormal-~𝑅𝑋𝑣superscript𝑏𝑇\iota_{m}(\mathcal{X})=\operatorname{span}(\tilde{R}X+vb^{T})italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( over~ start_ARG italic_R end_ARG italic_X + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is given by πm(𝒳)=span(R~TX)subscript𝜋𝑚𝒳normal-spansuperscriptnormal-~𝑅𝑇𝑋\pi_{m}(\mathcal{X})=\operatorname{span}(\tilde{R}^{T}X)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ).

Proof  First, let 𝒴=span(Y)Gr(p,m)𝒴span𝑌Gr𝑝𝑚\mathcal{Y}=\operatorname{span}(Y)\in\text{Gr}(p,m)caligraphic_Y = roman_span ( italic_Y ) ∈ Gr ( italic_p , italic_m ) and 𝒳=span(X)Gr(p,m+1)𝒳span𝑋Gr𝑝𝑚1\mathcal{X}=\operatorname{span}(X)\in\text{Gr}(p,m+1)caligraphic_X = roman_span ( italic_X ) ∈ Gr ( italic_p , italic_m + 1 ) be such that 𝒳=span(R~Y+vbT)𝒳span~𝑅𝑌𝑣superscript𝑏𝑇\mathcal{X}=\operatorname{span}(\tilde{R}Y+vb^{T})caligraphic_X = roman_span ( over~ start_ARG italic_R end_ARG italic_Y + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ). Then XL=R~Y+vbT𝑋𝐿~𝑅𝑌𝑣superscript𝑏𝑇XL=\tilde{R}Y+vb^{T}italic_X italic_L = over~ start_ARG italic_R end_ARG italic_Y + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for some LGL(p)𝐿GL𝑝L\in\text{GL}(p)italic_L ∈ GL ( italic_p ). Therefore, Y=R~T(XLvbT)=R~TXL𝑌superscript~𝑅𝑇𝑋𝐿𝑣superscript𝑏𝑇superscript~𝑅𝑇𝑋𝐿Y=\tilde{R}^{T}(XL-vb^{T})=\tilde{R}^{T}XLitalic_Y = over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_X italic_L - italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X italic_L and 𝒴=span(Y)=span(R~TXL)=span(R~TX)𝒴span𝑌spansuperscript~𝑅𝑇𝑋𝐿spansuperscript~𝑅𝑇𝑋\mathcal{Y}=\operatorname{span}(Y)=\operatorname{span}(\tilde{R}^{T}XL)=% \operatorname{span}(\tilde{R}^{T}X)caligraphic_Y = roman_span ( italic_Y ) = roman_span ( over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X italic_L ) = roman_span ( over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ). Hence, the projection is given by πm(𝒳)=span(R~TX)subscript𝜋𝑚𝒳spansuperscript~𝑅𝑇𝑋\pi_{m}(\mathcal{X})=\operatorname{span}(\tilde{R}^{T}X)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( over~ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ). This completes the proof.  

Note that for 𝒳=span(X)Gr(p,m+1)𝒳span𝑋Gr𝑝𝑚1\mathcal{X}=\operatorname{span}(X)\in\text{Gr}(p,m+1)caligraphic_X = roman_span ( italic_X ) ∈ Gr ( italic_p , italic_m + 1 ), ιm(πm(𝒳))=span(RRTX+vbT)=span((IvvT)X+vbT)subscript𝜄𝑚subscript𝜋𝑚𝒳span𝑅superscript𝑅𝑇𝑋𝑣superscript𝑏𝑇span𝐼𝑣superscript𝑣𝑇𝑋𝑣superscript𝑏𝑇\iota_{m}(\pi_{m}(\mathcal{X}))=\operatorname{span}(RR^{T}X+vb^{T})=% \operatorname{span}((I-vv^{T})X+vb^{T})italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) ) = roman_span ( italic_R italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = roman_span ( ( italic_I - italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) where vm+1𝑣superscript𝑚1v\in\mathbb{R}^{m+1}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT and v=1norm𝑣1\|v\|=1∥ italic_v ∥ = 1. The nested relation can be extended inductively and we refer to this construction as the nested Grassmann structure:

Gr(p,m)ιmGr(p,m+1)ιm+1ιn2Gr(p,n1)ιn1Gr(p,n).superscriptsubscript𝜄𝑚Gr𝑝𝑚Gr𝑝𝑚1superscriptsubscript𝜄𝑚1superscriptsubscript𝜄𝑛2Gr𝑝𝑛1superscriptsubscript𝜄𝑛1Gr𝑝𝑛\text{Gr}(p,m)\stackrel{{\scriptstyle\iota_{m}}}{{\hookrightarrow}}\text{Gr}(p% ,m+1)\stackrel{{\scriptstyle\iota_{m+1}}}{{\hookrightarrow}}\ldots\stackrel{{% \scriptstyle\iota_{n-2}}}{{\hookrightarrow}}\text{Gr}(p,n-1)\stackrel{{% \scriptstyle\iota_{n-1}}}{{\hookrightarrow}}\text{Gr}(p,n).Gr ( italic_p , italic_m ) start_RELOP SUPERSCRIPTOP start_ARG ↪ end_ARG start_ARG italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG end_RELOP Gr ( italic_p , italic_m + 1 ) start_RELOP SUPERSCRIPTOP start_ARG ↪ end_ARG start_ARG italic_ι start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT end_ARG end_RELOP … start_RELOP SUPERSCRIPTOP start_ARG ↪ end_ARG start_ARG italic_ι start_POSTSUBSCRIPT italic_n - 2 end_POSTSUBSCRIPT end_ARG end_RELOP Gr ( italic_p , italic_n - 1 ) start_RELOP SUPERSCRIPTOP start_ARG ↪ end_ARG start_ARG italic_ι start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT end_ARG end_RELOP Gr ( italic_p , italic_n ) .

Thus the embedding from Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) into Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) can be constructed inductively by ιιn1ιm1ιm𝜄subscript𝜄𝑛1subscript𝜄𝑚1subscript𝜄𝑚\iota\coloneqq\iota_{n-1}\circ\ldots\circ\iota_{m-1}\circ\iota_{m}italic_ι ≔ italic_ι start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∘ … ∘ italic_ι start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ∘ italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and similarly for the corresponding projection. The explicit forms of the embedding and the projection are given in the following proposition.

Proposition 4

The embedding of 𝐺𝑟(p,m)𝐺𝑟𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) into 𝐺𝑟(p,n)𝐺𝑟𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) for m<n𝑚𝑛m<nitalic_m < italic_n is given by ιA,B(𝒳)=span(AX+B)subscript𝜄𝐴𝐵𝒳normal-span𝐴𝑋𝐵\iota_{A,B}(\mathcal{X})=\operatorname{span}(AX+B)italic_ι start_POSTSUBSCRIPT italic_A , italic_B end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( italic_A italic_X + italic_B ) where A𝑆𝑡(m,n)𝐴𝑆𝑡𝑚𝑛A\in\text{St}(m,n)italic_A ∈ St ( italic_m , italic_n ) and Bn×p𝐵superscript𝑛𝑝B\in\mathbb{R}^{n\times p}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT such that ATB=0superscript𝐴𝑇𝐵0A^{T}B=0italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B = 0. The corresponding projection from 𝐺𝑟(p,n)𝐺𝑟𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) to 𝐺𝑟(p,m)𝐺𝑟𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) is given by πA=span(ATX)subscript𝜋𝐴normal-spansuperscript𝐴𝑇𝑋\pi_{A}=\operatorname{span}(A^{T}X)italic_π start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ).

Proof  By the definition, ιιn1ιm1ιm𝜄subscript𝜄𝑛1subscript𝜄𝑚1subscript𝜄𝑚\iota\coloneqq\iota_{n-1}\circ\ldots\circ\iota_{m-1}\circ\iota_{m}italic_ι ≔ italic_ι start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∘ … ∘ italic_ι start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ∘ italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and thus the embedding ι:Gr(p,m)Gr(p,n):𝜄Gr𝑝𝑚Gr𝑝𝑛\iota:\text{Gr}(p,m)\to\text{Gr}(p,n)italic_ι : Gr ( italic_p , italic_m ) → Gr ( italic_p , italic_n ) can be simplified as

ιA,B(𝒳)=span((i=mn1Ri)X+i=mn1(j=i+1n1Rj)vibiT)=span(AX+B)subscript𝜄𝐴𝐵𝒳spansuperscriptsubscriptproduct𝑖𝑚𝑛1subscript𝑅𝑖𝑋superscriptsubscript𝑖𝑚𝑛1superscriptsubscriptproduct𝑗𝑖1𝑛1subscript𝑅𝑗subscript𝑣𝑖superscriptsubscript𝑏𝑖𝑇span𝐴𝑋𝐵\iota_{A,B}(\mathcal{X})=\operatorname{span}\left(\Bigg{(}\prod_{i=m}^{n-1}R_{% i}\Bigg{)}X+\sum_{i=m}^{n-1}\Bigg{(}\prod_{j=i+1}^{n-1}R_{j}\Bigg{)}v_{i}b_{i}% ^{T}\right)=\operatorname{span}(AX+B)italic_ι start_POSTSUBSCRIPT italic_A , italic_B end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( ( ∏ start_POSTSUBSCRIPT italic_i = italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_X + ∑ start_POSTSUBSCRIPT italic_i = italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = roman_span ( italic_A italic_X + italic_B )

where RiSt(i,i+1)subscript𝑅𝑖St𝑖𝑖1R_{i}\in\text{St}(i,i+1)italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ St ( italic_i , italic_i + 1 ), visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is such that [Rivi]O(i+1)delimited-[]subscript𝑅𝑖subscript𝑣𝑖O𝑖1[R_{i}\;v_{i}]\in\text{O}(i+1)[ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ∈ O ( italic_i + 1 ), bipsubscript𝑏𝑖superscript𝑝b_{i}\in\mathbb{R}^{p}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, A=Rn1Rn2RmSt(m,n)𝐴subscript𝑅𝑛1subscript𝑅𝑛2subscript𝑅𝑚St𝑚𝑛A=R_{n-1}R_{n-2}\cdots R_{m}\in\text{St}(m,n)italic_A = italic_R start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_n - 2 end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ St ( italic_m , italic_n ), and B=i=mn1(j=i+1n1Rj)vibiT𝐵superscriptsubscript𝑖𝑚𝑛1superscriptsubscriptproduct𝑗𝑖1𝑛1subscript𝑅𝑗subscript𝑣𝑖superscriptsubscript𝑏𝑖𝑇B=\sum_{i=m}^{n-1}\big{(}\prod_{j=i+1}^{n-1}R_{j}\big{)}v_{i}b_{i}^{T}italic_B = ∑ start_POSTSUBSCRIPT italic_i = italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is an n×p𝑛𝑝n\times pitalic_n × italic_p matrix. It is easy to see that ATB=0superscript𝐴𝑇𝐵0A^{T}B=0italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B = 0. Similar to Proposition 3, the projection πA:Gr(p,n)Gr(p,m):subscript𝜋𝐴Gr𝑝𝑛Gr𝑝𝑚\pi_{A}:\text{Gr}(p,n)\to\text{Gr}(p,m)italic_π start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT : Gr ( italic_p , italic_n ) → Gr ( italic_p , italic_m ) is then given by πA(𝒳)=span(ATX)subscript𝜋𝐴𝒳spansuperscript𝐴𝑇𝑋\pi_{A}(\mathcal{X})=\operatorname{span}(A^{T}X)italic_π start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( caligraphic_X ) = roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ). This completes the proof.  

From Proposition 2, if B=0𝐵0B=0italic_B = 0 then ιAsubscript𝜄𝐴\iota_{A}italic_ι start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT is an isometric embedding. Hence, our nested Grassmann structure is more flexible than PGA as it allows one to project the data onto a non-geodesic submanifold. An illustration is shown in Figure 2. The results discussed in this subsection can be generalized to any homogeneous space in principle. For a given homogeneous space, once the embedding of the groups of isometries (e.g., Eq. (3)) is determined, the induced embedding and the corresponding projection can be derived akin to the case of Grassmann manifolds.

3.3 Connections to Other Nested Structures

The nested homogeneous spaces proposed in this work (see Figure 1) actually provides a unified framework within which, the nested spheres (Jung et al., 2012) and the nested SPD manifolds (Harandi et al., 2018) are special cases.

The n𝑛nitalic_n-Sphere Example: Since the n𝑛nitalic_n-sphere can be identified with a homogeneous space Sn1O(n)/O(n1)superscript𝑆𝑛1O𝑛O𝑛1S^{n-1}\cong\text{O}(n)/\text{O}(n-1)italic_S start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ≅ O ( italic_n ) / O ( italic_n - 1 ), with the embedding (3), the induced embedding of Sn1superscript𝑆𝑛1S^{n-1}italic_S start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT into Snsuperscript𝑆𝑛S^{n}italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is

ι(𝒙)=GS(R[xb])=11+b2R[xb]=R[sin(r)xcos(r)]𝜄𝒙GS𝑅delimited-[]𝑥𝑏11superscript𝑏2𝑅delimited-[]𝑥𝑏𝑅delimited-[]𝑟𝑥𝑟\iota(\boldsymbol{x})=\text{GS}\left(R\left[\begin{array}[]{c}x\\ b\end{array}\right]\right)=\frac{1}{\sqrt{1+b^{2}}}R\left[\begin{array}[]{c}x% \\ b\end{array}\right]=R\left[\begin{array}[]{c}\sin(r)x\\ \cos(r)\end{array}\right]italic_ι ( bold_italic_x ) = GS ( italic_R [ start_ARRAY start_ROW start_CELL italic_x end_CELL end_ROW start_ROW start_CELL italic_b end_CELL end_ROW end_ARRAY ] ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG italic_R [ start_ARRAY start_ROW start_CELL italic_x end_CELL end_ROW start_ROW start_CELL italic_b end_CELL end_ROW end_ARRAY ] = italic_R [ start_ARRAY start_ROW start_CELL roman_sin ( italic_r ) italic_x end_CELL end_ROW start_ROW start_CELL roman_cos ( italic_r ) end_CELL end_ROW end_ARRAY ]

where xSn1𝑥superscript𝑆𝑛1x\in S^{n-1}italic_x ∈ italic_S start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT, b𝑏b\in\mathbb{R}italic_b ∈ blackboard_R, and r=cos1(b1+b2)𝑟superscript1𝑏1superscript𝑏2r=\cos^{-1}\big{(}\frac{b}{\sqrt{1+b^{2}}}\big{)}italic_r = roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_b end_ARG start_ARG square-root start_ARG 1 + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ). This is precisely the nested sphere proposed in Jung et al. (2012, Eq. (2)).

The SPD Manifold Example: For the m𝑚mitalic_m-dimensional SPD manifold denoted by Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Gm=GL(m)subscript𝐺𝑚GL𝑚G_{m}=\text{GL}(m)italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = GL ( italic_m ) and Hm=O(m)subscript𝐻𝑚O𝑚H_{m}=\text{O}(m)italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = O ( italic_m ). Consider the embedding ι~m:GL(m)GL(m+1):subscript~𝜄𝑚GL𝑚GL𝑚1\tilde{\iota}_{m}:\text{GL}(m)\to\text{GL}(m+1)over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : GL ( italic_m ) → GL ( italic_m + 1 ) given by

A~=ι~m(A)=R[A001]RT,~𝐴subscript~𝜄𝑚𝐴𝑅delimited-[]𝐴001superscript𝑅𝑇\tilde{A}=\tilde{\iota}_{m}(A)=R\left[\begin{array}[]{cc}A&0\\ 0&1\end{array}\right]R^{T},over~ start_ARG italic_A end_ARG = over~ start_ARG italic_ι end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_A ) = italic_R [ start_ARRAY start_ROW start_CELL italic_A end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARRAY ] italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ,

where AGL(m)𝐴GL𝑚A\in\text{GL}(m)italic_A ∈ GL ( italic_m ), RO(m+1)𝑅O𝑚1R\in\text{O}(m+1)italic_R ∈ O ( italic_m + 1 ) and the corresponding projection π~m:GL(m+1)GL(m):subscript~𝜋𝑚GL𝑚1GL𝑚\tilde{\pi}_{m}:\text{GL}(m+1)\to\text{GL}(m)over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : GL ( italic_m + 1 ) → GL ( italic_m ) is

π~m(A~)=WTA~Wsubscript~𝜋𝑚~𝐴superscript𝑊𝑇~𝐴𝑊\tilde{\pi}_{m}(\tilde{A})=W^{T}\tilde{A}Wover~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over~ start_ARG italic_A end_ARG ) = italic_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG italic_W

where W𝑊Witalic_W contains the first m𝑚mitalic_m columns of R=[Wv]O(m+1)𝑅delimited-[]𝑊𝑣O𝑚1R=[W\;v]\in\text{O}(m+1)italic_R = [ italic_W italic_v ] ∈ O ( italic_m + 1 ) (i.e., WSt(m,m+1)𝑊St𝑚𝑚1W\in\text{St}(m,m+1)italic_W ∈ St ( italic_m , italic_m + 1 ) and WTv=0superscript𝑊𝑇𝑣0W^{T}v=0italic_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_v = 0). The submersion ψf:GL(m)Pm:𝜓𝑓GL𝑚subscript𝑃𝑚\psi\circ f:\text{GL}(m)\to P_{m}italic_ψ ∘ italic_f : GL ( italic_m ) → italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is given by ψf(A)=ATA𝜓𝑓𝐴superscript𝐴𝑇𝐴\psi\circ f(A)=A^{T}Aitalic_ψ ∘ italic_f ( italic_A ) = italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A. Hence the induced embedding ιm:PmPm+1:subscript𝜄𝑚subscript𝑃𝑚subscript𝑃𝑚1\iota_{m}:P_{m}\to P_{m+1}italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT → italic_P start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT and projection πm:Pm+1Pm:subscript𝜋𝑚subscript𝑃𝑚1subscript𝑃𝑚\pi_{m}:P_{m+1}\to P_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT → italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are

ιm(X)=WXWT+vvT and πm(X)=WTXWsubscript𝜄𝑚𝑋𝑊𝑋superscript𝑊𝑇𝑣superscript𝑣𝑇 and subscript𝜋𝑚𝑋superscript𝑊𝑇𝑋𝑊\iota_{m}(X)=WXW^{T}+vv^{T}\text{ and }\pi_{m}(X)=W^{T}XWitalic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_X ) = italic_W italic_X italic_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_X ) = italic_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X italic_W

which is the projection map used in Harandi et al. (2018, Eq. (13)). However, Harandi et al. (2018) did not perform any embedding or construct a nested family of SPD manifolds. Recently, it came to our attention that Jaquier and Rozo (2020) derived a similar nested family of SPD manifolds based on the projection maps in Harandi et al. (2018) described above.

4 Dimensionality Reduction with Nested Grassmanns

In this section, we discuss how to apply the nested Grassmann structure to the problem of dimension reduction. In Section 4.1 and 4.2, we describe the unsupervised and supervised dimension reduction based on the nested Grassmann manifolds. In Section 4.3, we will discuss the choice of distance metrics required by the dimensionality reduction algorithm and present some technical details regarding the implementation. Analysis of principal nested Grassmann (PNG) will be introduced and discussed in Section 4.4 and Section 4.5.

4.1 Unsupervised Dimensionality Reduction

We can now apply the nested Grassmann structure to the problem of unsupervised dimensionality reduction. Suppose that we are given the points, 𝒳1,,𝒳NGr(p,n)subscript𝒳1subscript𝒳𝑁Gr𝑝𝑛\mathcal{X}_{1},\ldots,\mathcal{X}_{N}\in\text{Gr}(p,n)caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ Gr ( italic_p , italic_n ). We would like to have lower dimensional representations in Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) for 𝒳1,,𝒳Nsubscript𝒳1subscript𝒳𝑁\mathcal{X}_{1},\ldots,\mathcal{X}_{N}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT with mnmuch-less-than𝑚𝑛m\ll nitalic_m ≪ italic_n. The desired projection map πAsubscript𝜋𝐴\pi_{A}italic_π start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT that we seek is obtained by the minimizing the reconstruction error, i.e.

Lu(A,B)=1Ni=1Nd2(𝒳i,𝒳^i)=1Ni=1Nd2(span(Xi),span(AATXi+B))subscript𝐿𝑢𝐴𝐵1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2subscript𝒳𝑖subscript^𝒳𝑖1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2spansubscript𝑋𝑖span𝐴superscript𝐴𝑇subscript𝑋𝑖𝐵L_{u}(A,B)=\frac{1}{N}\sum_{i=1}^{N}d^{2}(\mathcal{X}_{i},\hat{\mathcal{X}}_{i% })=\frac{1}{N}\sum_{i=1}^{N}d^{2}(\operatorname{span}(X_{i}),\operatorname{% span}(AA^{T}X_{i}+B))italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A , italic_B ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_span ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_span ( italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B ) )

where d(,)𝑑d(\cdot,\cdot)italic_d ( ⋅ , ⋅ ) is a distance metric on Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ). It is clear that Lusubscript𝐿𝑢L_{u}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT has a O(m)O𝑚\text{O}(m)O ( italic_m )-symmetry in the first argument, i.e. Lu(AO,B)=Lu(A,B)subscript𝐿𝑢𝐴𝑂𝐵subscript𝐿𝑢𝐴𝐵L_{u}(AO,B)=L_{u}(A,B)italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A italic_O , italic_B ) = italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A , italic_B ) for OO(m)𝑂O𝑚O\in\text{O}(m)italic_O ∈ O ( italic_m ). Hence, the optimization is performed over the space St(m,n)/O(m)Gr(m,n)St𝑚𝑛O𝑚Gr𝑚𝑛\text{St}(m,n)/\text{O}(m)\cong\text{Gr}(m,n)St ( italic_m , italic_n ) / O ( italic_m ) ≅ Gr ( italic_m , italic_n ) when optimizing with respect to this particular loss function. Now we can apply the Riemannian gradient descent algorithm (Edelman et al., 1998) to obtain A𝐴Aitalic_A and B𝐵Bitalic_B by optimizing Lu(A,B)subscript𝐿𝑢𝐴𝐵L_{u}(A,B)italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A , italic_B ) over span(A)Gr(m,n)span𝐴Gr𝑚𝑛\operatorname{span}(A)\in\text{Gr}(m,n)roman_span ( italic_A ) ∈ Gr ( italic_m , italic_n ) and Bn×p𝐵superscript𝑛𝑝B\in\mathbb{R}^{n\times p}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT such that ATB=0superscript𝐴𝑇𝐵0A^{T}B=0italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B = 0. Note that the restriction ATB=0superscript𝐴𝑇𝐵0A^{T}B=0italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B = 0 simply means that the columns of B𝐵Bitalic_B are in the null space of ATsuperscript𝐴𝑇A^{T}italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, denoted N(AT)𝑁superscript𝐴𝑇N(A^{T})italic_N ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ). Hence in practice this restriction can be handled as follows. For arbitrary B~n×p~𝐵superscript𝑛𝑝\tilde{B}\in\mathbb{R}^{n\times p}over~ start_ARG italic_B end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT, project B~~𝐵\tilde{B}over~ start_ARG italic_B end_ARG on to N(AT)𝑁superscript𝐴𝑇N(A^{T})italic_N ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ), i.e. B=PN(AT)B~𝐵subscript𝑃𝑁superscript𝐴𝑇~𝐵B=P_{N(A^{T})}\tilde{B}italic_B = italic_P start_POSTSUBSCRIPT italic_N ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG italic_B end_ARG where PN(AT)=IAATsubscript𝑃𝑁superscript𝐴𝑇𝐼𝐴superscript𝐴𝑇P_{N(A^{T})}=I-AA^{T}italic_P start_POSTSUBSCRIPT italic_N ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT = italic_I - italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the projection from nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to N(AT)𝑁superscript𝐴𝑇N(A^{T})italic_N ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ). Thus, the loss function can be written as

Lu(A,B)=1Ni=1Nd2(span(Xi),span(AATXi+(IAAT)B))subscript𝐿𝑢𝐴𝐵1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2spansubscript𝑋𝑖span𝐴superscript𝐴𝑇subscript𝑋𝑖𝐼𝐴superscript𝐴𝑇𝐵L_{u}(A,B)=\frac{1}{N}\sum_{i=1}^{N}d^{2}(\operatorname{span}(X_{i}),% \operatorname{span}(AA^{T}X_{i}+(I-AA^{T})B))italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A , italic_B ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_span ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_span ( italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_I - italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_B ) )

and it is optimized over Gr(m,n)×n×pGr𝑚𝑛superscript𝑛𝑝\text{Gr}(m,n)\times\mathbb{R}^{n\times p}Gr ( italic_m , italic_n ) × blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Note that since we represent a subspace by its orthonormal basis, when m>n/2𝑚𝑛2m>n/2italic_m > italic_n / 2, we can use the isomorphism Gr(m,n)Gr(nm,n)Gr𝑚𝑛Gr𝑛𝑚𝑛\text{Gr}(m,n)\cong\text{Gr}(n-m,n)Gr ( italic_m , italic_n ) ≅ Gr ( italic_n - italic_m , italic_n ) to further reduce the computational burden. This will be particularly useful when m=n1𝑚𝑛1m=n-1italic_m = italic_n - 1 as in Section 4.4. Under this isomorphism Gr(m,n)Gr(nm,n)Gr𝑚𝑛Gr𝑛𝑚𝑛\text{Gr}(m,n)\cong\text{Gr}(n-m,n)Gr ( italic_m , italic_n ) ≅ Gr ( italic_n - italic_m , italic_n ), the corresponding subspace of span(A)Gr(m,n)span𝐴Gr𝑚𝑛\operatorname{span}(A)\in\text{Gr}(m,n)roman_span ( italic_A ) ∈ Gr ( italic_m , italic_n ) is span(A)Gr(nm,n)spansubscript𝐴perpendicular-toGr𝑛𝑚𝑛\operatorname{span}(A_{\perp})\in\text{Gr}(n-m,n)roman_span ( italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT ) ∈ Gr ( italic_n - italic_m , italic_n ) where Asubscript𝐴perpendicular-toA_{\perp}italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT is an n×(nm)𝑛𝑛𝑚n\times(n-m)italic_n × ( italic_n - italic_m ) matrix such that [AA]delimited-[]𝐴subscript𝐴perpendicular-to[A\;A_{\perp}][ italic_A italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT ] is an orthogonal matrix. Hence the loss function Lusubscript𝐿𝑢L_{u}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can alternatively be expressed as

Lu(A,B)=1Ni=1Nd2(span(Xi),span((IAAT)Xi+AATB)).subscript𝐿𝑢𝐴𝐵1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2spansubscript𝑋𝑖span𝐼subscript𝐴perpendicular-tosuperscriptsubscript𝐴perpendicular-to𝑇subscript𝑋𝑖subscript𝐴perpendicular-tosuperscriptsubscript𝐴perpendicular-to𝑇𝐵L_{u}(A,B)=\frac{1}{N}\sum_{i=1}^{N}d^{2}(\operatorname{span}(X_{i}),% \operatorname{span}((I-A_{\perp}A_{\perp}^{T})X_{i}+A_{\perp}A_{\perp}^{T}B)).italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_A , italic_B ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_span ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_span ( ( italic_I - italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B ) ) .
Remark 1

The reduction of the space of all possible projections from 𝑆𝑡(m,n)𝑆𝑡𝑚𝑛\text{St}(m,n)St ( italic_m , italic_n ) to 𝐺𝑟(m,n)𝐺𝑟𝑚𝑛\text{Gr}(m,n)Gr ( italic_m , italic_n ) is a consequence of the choice of the loss function Lusubscript𝐿𝑢L_{u}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. With a different loss function, one might have a different space of possible projections.

4.2 Supervised Dimensionality Reduction

If in addition to 𝒳1,,𝒳NGr(p,n)subscript𝒳1subscript𝒳𝑁Gr𝑝𝑛\mathcal{X}_{1},\ldots,\mathcal{X}_{N}\in\text{Gr}(p,n)caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ Gr ( italic_p , italic_n ), we are given the associated labels y1,,yN{1,,k}subscript𝑦1subscript𝑦𝑁1𝑘y_{1},\ldots,y_{N}\in\{1,\ldots,k\}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ { 1 , … , italic_k }, then we would like to use this extra information to sharpen the result of dimensionality reduction. Specifically, we expect that after reducing the dimension, points from the same class preserve their proximity while points from different classes are well separated. We use an affinity function a:Gr(p,n)×Gr(p,n):𝑎Gr𝑝𝑛Gr𝑝𝑛a:\text{Gr}(p,n)\times\text{Gr}(p,n)\to\mathbb{R}italic_a : Gr ( italic_p , italic_n ) × Gr ( italic_p , italic_n ) → blackboard_R to encode the structure of the data as suggested by Harandi et al. (2018, Sec 3.1, Eq. (14)-(16)). For completeness, we repeat the definition of the affinity function here. The affinity function is defined as a(𝒳i,𝒳j)=gw(𝒳i,𝒳j)gb(𝒳i,𝒳j)𝑎subscript𝒳𝑖subscript𝒳𝑗subscript𝑔𝑤subscript𝒳𝑖subscript𝒳𝑗subscript𝑔𝑏subscript𝒳𝑖subscript𝒳𝑗a(\mathcal{X}_{i},\mathcal{X}_{j})=g_{w}(\mathcal{X}_{i},\mathcal{X}_{j})-g_{b% }(\mathcal{X}_{i},\mathcal{X}_{j})italic_a ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_g start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) where

gw(𝒳i,𝒳j)subscript𝑔𝑤subscript𝒳𝑖subscript𝒳𝑗\displaystyle g_{w}(\mathcal{X}_{i},\mathcal{X}_{j})italic_g start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )={1if 𝒳iNw(𝒳j) or 𝒳jNw(𝒳i)0Otherwiseabsentcases1if subscript𝒳𝑖subscript𝑁𝑤subscript𝒳𝑗 or subscript𝒳𝑗subscript𝑁𝑤subscript𝒳𝑖0Otherwise\displaystyle=\begin{cases}1&\text{if }\mathcal{X}_{i}\in N_{w}(\mathcal{X}_{j% })\text{ or }\mathcal{X}_{j}\in N_{w}(\mathcal{X}_{i})\\ 0&\text{\text{Otherwise}}\end{cases}= { start_ROW start_CELL 1 end_CELL start_CELL if caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) or caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL Otherwise end_CELL end_ROW
gb(𝒳i,𝒳j)subscript𝑔𝑏subscript𝒳𝑖subscript𝒳𝑗\displaystyle g_{b}(\mathcal{X}_{i},\mathcal{X}_{j})italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )={1if 𝒳iNb(𝒳j) or 𝒳jNb(𝒳i)0Otherwise.absentcases1if subscript𝒳𝑖subscript𝑁𝑏subscript𝒳𝑗 or subscript𝒳𝑗subscript𝑁𝑏subscript𝒳𝑖0Otherwise\displaystyle=\begin{cases}1&\text{if }\mathcal{X}_{i}\in N_{b}(\mathcal{X}_{j% })\text{ or }\mathcal{X}_{j}\in N_{b}(\mathcal{X}_{i})\\ 0&\text{\text{Otherwise}}\end{cases}.= { start_ROW start_CELL 1 end_CELL start_CELL if caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) or caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL Otherwise end_CELL end_ROW .

The set Nw(𝒳i)subscript𝑁𝑤subscript𝒳𝑖N_{w}(\mathcal{X}_{i})italic_N start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) consists of νwsubscript𝜈𝑤\nu_{w}italic_ν start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT nearest neighbors of 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that have the same labels as yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the set Nb(𝒳i)subscript𝑁𝑏subscript𝒳𝑖N_{b}(\mathcal{X}_{i})italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) consists of νbsubscript𝜈𝑏\nu_{b}italic_ν start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT nearest neighbors of 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that have different labels from yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The nearest neighbors can be computed using the geodesic distance.

The desired projection map πAsubscript𝜋𝐴\pi_{A}italic_π start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT that we seek is obtained by the minimizing the following loss function

Ls(A)=1N2i,j=1Na(𝒳i,𝒳j)d2(span(ATXi),span(ATXj))subscript𝐿𝑠𝐴1superscript𝑁2superscriptsubscript𝑖𝑗1𝑁𝑎subscript𝒳𝑖subscript𝒳𝑗superscript𝑑2spansuperscript𝐴𝑇subscript𝑋𝑖spansuperscript𝐴𝑇subscript𝑋𝑗L_{s}(A)=\frac{1}{N^{2}}\sum_{i,j=1}^{N}a(\mathcal{X}_{i},\mathcal{X}_{j})d^{2% }(\operatorname{span}(A^{T}X_{i}),\operatorname{span}(A^{T}X_{j}))italic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_A ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_a ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) )

where d𝑑ditalic_d is a distance metric on Gr(p,m)Gr𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ). Note that if the distance metric d𝑑ditalic_d has O(m)O𝑚\text{O}(m)O ( italic_m )-symmetry, e.g. the geodesic distance, so does Lssubscript𝐿𝑠L_{s}italic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. In this case the optimization can be done on St(m,n)/O(m)Gr(m,n)St𝑚𝑛O𝑚Gr𝑚𝑛\text{St}(m,n)/\text{O}(m)\cong\text{Gr}(m,n)St ( italic_m , italic_n ) / O ( italic_m ) ≅ Gr ( italic_m , italic_n ). Otherwise it is on St(m,n)St𝑚𝑛\text{St}(m,n)St ( italic_m , italic_n ). This supervised dimensionality reduction is termed as, supervised nested Grassmann (sNG).

4.3 Choice of the distance function

The loss functions Lusubscript𝐿𝑢L_{u}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and Lssubscript𝐿𝑠L_{s}italic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT depend on the choice of the distance d:Gr(p,n)×Gr(p,n)0:𝑑Gr𝑝𝑛Gr𝑝𝑛subscriptabsent0d:\text{Gr}(p,n)\times\text{Gr}(p,n)\to\mathbb{R}_{\geq 0}italic_d : Gr ( italic_p , italic_n ) × Gr ( italic_p , italic_n ) → blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT. Besides the geodesic distance, there are many widely used distances on the Grassmann manifold, see, for example, Edelman et al. (1998, p. 337) and Ye and Lim (2016, Table 2). In this work, we use two different distance metrics: (1) the geodesic distance dgsubscript𝑑𝑔d_{g}italic_d start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and (2) the projection distance, which is also called the chordal distance in Ye and Lim (2016) and the projection F𝐹Fitalic_F-norm in Edelman et al. (1998). The geodesic distance was defined in Section 3.1 and the projection distance is defined as follows. For 𝒳,𝒴Gr(p,n)𝒳𝒴Gr𝑝𝑛\mathcal{X},\mathcal{Y}\in\text{Gr}(p,n)caligraphic_X , caligraphic_Y ∈ Gr ( italic_p , italic_n ), denote the projection matrices onto 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y by P𝒳subscript𝑃𝒳P_{\mathcal{X}}italic_P start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT and P𝒴subscript𝑃𝒴P_{\mathcal{Y}}italic_P start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT respectively. Then, the distance between 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y is given by dp(𝒳,𝒴)=P𝒳P𝒴F/2=(i=1psin2θi)1/2subscript𝑑𝑝𝒳𝒴subscriptnormsubscript𝑃𝒳subscript𝑃𝒴𝐹2superscriptsuperscriptsubscript𝑖1𝑝superscript2subscript𝜃𝑖12d_{p}(\mathcal{X},\mathcal{Y})=\|P_{\mathcal{X}}-P_{\mathcal{Y}}\|_{F}/\sqrt{2% }=\big{(}\sum_{i=1}^{p}\sin^{2}\theta_{i}\big{)}^{1/2}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( caligraphic_X , caligraphic_Y ) = ∥ italic_P start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT / square-root start_ARG 2 end_ARG = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT where θ1,,θpsubscript𝜃1subscript𝜃𝑝\theta_{1},\ldots,\theta_{p}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT are the principal angles of 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y. If 𝒳=span(X)𝒳span𝑋\mathcal{X}=\operatorname{span}(X)caligraphic_X = roman_span ( italic_X ), then P𝒳=X(XTX)1XTsubscript𝑃𝒳𝑋superscriptsuperscript𝑋𝑇𝑋1superscript𝑋𝑇P_{\mathcal{X}}=X(X^{T}X)^{-1}X^{T}italic_P start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = italic_X ( italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. It is also easy to see the the projection distance has O(n)O𝑛\text{O}(n)O ( italic_n )-symmetry. We choose the projection distance mainly for its computational efficiency as it involves only matrix multiplication which has a time complexity O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) whereas the geodesic distance requires an SVD which has a time complexity of O(n3)𝑂superscript𝑛3O(n^{3})italic_O ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ).

4.4 Analysis of Principal Nested Grassmannians

To determine the dimension of the nested submanifold that fits the data well enough, we can choose p<m1<<mk<n𝑝subscript𝑚1subscript𝑚𝑘𝑛p<m_{1}<\ldots<m_{k}<nitalic_p < italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_n and estimate the projection onto these nested Grassmann manifolds. The ratio of expressed variance for each projection is the ratio of the variance of the projected data and the variance of the original data. With these ratios, we can choose the desired dimension according to the pre-specified percentage of expressed variance as one would do for choosing the number of principal components in PCA.

Alternatively, one can have a full analysis of principal nested Grassmanns (PNG) as follows. Starting from Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ), one can reduce the dimension down to Gr(p,p+1)Gr𝑝𝑝1\text{Gr}(p,p+1)Gr ( italic_p , italic_p + 1 ). Using the diffeomorphism between Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) and Gr(p,np)Gr𝑝𝑛𝑝\text{Gr}(p,n-p)Gr ( italic_p , italic_n - italic_p ), we have Gr(p,p+1)Gr(1,p+1)Gr𝑝𝑝1Gr1𝑝1\text{Gr}(p,p+1)\cong\text{Gr}(1,p+1)Gr ( italic_p , italic_p + 1 ) ≅ Gr ( 1 , italic_p + 1 ), and hence we can continue reducing the dimension down to Gr(1,2)Gr12\text{Gr}(1,2)Gr ( 1 , 2 ). The resulting sequence will be

Gr(p,n)Gr(p,n1)Gr(p,p+1)=Gr(1,p+1)Gr(1,p)Gr(1,2).Gr𝑝𝑛Gr𝑝𝑛1Gr𝑝𝑝1Gr1𝑝1Gr1𝑝Gr12\text{Gr}(p,n)\to\text{Gr}(p,n-1)\to\cdots\to\text{Gr}(p,p+1)=\text{Gr}(1,p+1)% \to\text{Gr}(1,p)\to\cdots\to\text{Gr}(1,2).Gr ( italic_p , italic_n ) → Gr ( italic_p , italic_n - 1 ) → ⋯ → Gr ( italic_p , italic_p + 1 ) = Gr ( 1 , italic_p + 1 ) → Gr ( 1 , italic_p ) → ⋯ → Gr ( 1 , 2 ) .

Furthermore, we can reduce the points on Gr(1,2)Gr12\text{Gr}(1,2)Gr ( 1 , 2 ), which is a 1-dimensional manifold, to a 0-dimensional manifold, which is simply a point, by computing the Fréchet mean (FM). We call this FM the nested Grassmannian mean (NGM) of 𝒳1,,𝒳NGr(p,n)subscript𝒳1subscript𝒳𝑁Gr𝑝𝑛\mathcal{X}_{1},\ldots,\mathcal{X}_{N}\in\text{Gr}(p,n)caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ Gr ( italic_p , italic_n ). The NGM is unique since Gr(1,2)P1Gr12superscript𝑃1\text{Gr}(1,2)\cong\mathbb{R}P^{1}Gr ( 1 , 2 ) ≅ blackboard_R italic_P start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT can be identified as the half circle in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the FM is unique in this case. Note that in general, the NGM will not be the same as the FM of 𝒳1,,𝒳Nsubscript𝒳1subscript𝒳𝑁\mathcal{X}_{1},\ldots,\mathcal{X}_{N}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT since the embedding/projection need not be isometric. The supervised PNG (sPNG) can be obtained similarly by replacing each projection with it supervised counterpart.

4.5 Principal Scores

Whenever we apply a projection πm:Gr(p,m+1)Gr(p,m):subscript𝜋𝑚Gr𝑝𝑚1Gr𝑝𝑚\pi_{m}:\text{Gr}(p,m+1)\to\text{Gr}(p,m)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : Gr ( italic_p , italic_m + 1 ) → Gr ( italic_p , italic_m ) to the data, we might lose some information contained in the data. More specifically, since we project data on a p(m+1p)𝑝𝑚1𝑝p(m+1-p)italic_p ( italic_m + 1 - italic_p )-dimensional manifold to a p(mp)𝑝𝑚𝑝p(m-p)italic_p ( italic_m - italic_p )-dimensional manifold, we need to describe this p(m+1p)p(mp)=p𝑝𝑚1𝑝𝑝𝑚𝑝𝑝p(m+1-p)-p(m-p)=pitalic_p ( italic_m + 1 - italic_p ) - italic_p ( italic_m - italic_p ) = italic_p dimensional information loss during the projection. In PCA, this is done by computing the scores of each principal component, which are the transformed coordinates of each sample in the eigenspace of the covariance matrix. We can generalize the notion of principal scores to the nested Grassmanns as follows: For each 𝒳Gr(p,m+1)𝒳Gr𝑝𝑚1\mathcal{X}\in\text{Gr}(p,m+1)caligraphic_X ∈ Gr ( italic_p , italic_m + 1 ), denote by Mπm(𝒳)subscript𝑀subscript𝜋𝑚𝒳M_{\pi_{m}(\mathcal{X})}italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT, the fiber of πm(𝒳)subscript𝜋𝑚𝒳\pi_{m}(\mathcal{X})italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ), i.e. Mπm(𝒳)=πm1(πm(𝒳))={𝒴Gr(p,m+1):πm(𝒴)=πm(𝒳)}subscript𝑀subscript𝜋𝑚𝒳superscriptsubscript𝜋𝑚1subscript𝜋𝑚𝒳conditional-set𝒴Gr𝑝𝑚1subscript𝜋𝑚𝒴subscript𝜋𝑚𝒳M_{\pi_{m}(\mathcal{X})}=\pi_{m}^{-1}(\pi_{m}(\mathcal{X}))=\{\mathcal{Y}\in% \text{Gr}(p,m+1):\pi_{m}(\mathcal{Y})=\pi_{m}(\mathcal{X})\}italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) ) = { caligraphic_Y ∈ Gr ( italic_p , italic_m + 1 ) : italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_Y ) = italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) }, which is a p𝑝pitalic_p-dimensional submanifold of Gr(p,m+1)Gr𝑝𝑚1\text{Gr}(p,m+1)Gr ( italic_p , italic_m + 1 ). An illustration of this fiber is given in Figure 2(a). Let 𝒳~=ιm(πm(𝒳))~𝒳subscript𝜄𝑚subscript𝜋𝑚𝒳\widetilde{\mathcal{X}}=\iota_{m}(\pi_{m}(\mathcal{X}))over~ start_ARG caligraphic_X end_ARG = italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) ) and let the unit tangent vector VT𝒳~Mπm(𝒳)𝑉subscript𝑇~𝒳subscript𝑀subscript𝜋𝑚𝒳V\in T_{\widetilde{\mathcal{X}}}M_{\pi_{m}(\mathcal{X})}italic_V ∈ italic_T start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT be the geodesic direction from 𝒳~~𝒳\widetilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG to 𝒳𝒳\mathcal{X}caligraphic_X. Given a suitable basis on T𝒳~Mπm(𝒳)subscript𝑇~𝒳subscript𝑀subscript𝜋𝑚𝒳T_{\widetilde{\mathcal{X}}}M_{\pi_{m}(\mathcal{X})}italic_T start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT, V𝑉Vitalic_V can be realized as a p𝑝pitalic_p-dimensional vector, and this will be the score vector of 𝒳𝒳\mathcal{X}caligraphic_X associated with the projection πmsubscript𝜋𝑚\pi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

Refer to caption
(a) The fiber Mπm(𝒳)subscript𝑀subscript𝜋𝑚𝒳M_{\pi_{m}(\mathcal{X})}italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT in Gr(p,m+1)Gr𝑝𝑚1\text{Gr}(p,m+1)Gr ( italic_p , italic_m + 1 ).
Refer to caption
(b) The horizontal space (in blue) and the vertical space (in green) induced by πm:Gr(m+1)Gr(p,m):subscript𝜋𝑚Gr𝑚1Gr𝑝𝑚\pi_{m}:\text{Gr}(m+1)\to\text{Gr}(p,m)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : Gr ( italic_m + 1 ) → Gr ( italic_p , italic_m ).
Figure 3: Illustrations of submanifolds induced by πmsubscript𝜋𝑚\pi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

By the definition of Mπm(𝒳)subscript𝑀subscript𝜋𝑚𝒳M_{\pi_{m}(\mathcal{X})}italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT, we have the following decomposition of the tangent space of Gr(p,m+1)Gr𝑝𝑚1\text{Gr}(p,m+1)Gr ( italic_p , italic_m + 1 ) at 𝒳~~𝒳\widetilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG into the horizontal space and the vertical space induced by πmsubscript𝜋𝑚\pi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT,

T𝒳~Gr(p,m+1)=T𝒳~Mπm(𝒳)(dιm)πm(𝒳)(Tπm(𝒳)Gr(p,m)).subscript𝑇~𝒳Gr𝑝𝑚1direct-sumsubscript𝑇~𝒳subscript𝑀subscript𝜋𝑚𝒳subscript𝑑subscript𝜄𝑚subscript𝜋𝑚𝒳subscript𝑇subscript𝜋𝑚𝒳Gr𝑝𝑚T_{\widetilde{\mathcal{X}}}\text{Gr}(p,m+1)=T_{\widetilde{\mathcal{X}}}M_{\pi_% {m}(\mathcal{X})}\oplus(d\iota_{m})_{\pi_{m}(\mathcal{X})}(T_{\pi_{m}(\mathcal% {X})}\text{Gr}(p,m)).italic_T start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG end_POSTSUBSCRIPT Gr ( italic_p , italic_m + 1 ) = italic_T start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT ⊕ ( italic_d italic_ι start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT Gr ( italic_p , italic_m ) ) .

An illustration of this decomposition is given in Figure 2(b). A tangent vector of Mπm(𝒳)subscript𝑀subscript𝜋𝑚𝒳M_{\pi_{m}(\mathcal{X})}italic_M start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_X ) end_POSTSUBSCRIPT at 𝒳~~𝒳\widetilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG is of the form Δ=AbTΔsubscript𝐴perpendicular-tosuperscript𝑏𝑇\Delta=A_{\perp}b^{T}roman_Δ = italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where Asubscript𝐴perpendicular-toA_{\perp}italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT is any (m+1)𝑚1(m+1)( italic_m + 1 )-dim vector such that [AA]delimited-[]𝐴subscript𝐴perpendicular-to[A\;A_{\perp}][ italic_A italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT ] is orthogonal and bp𝑏superscript𝑝b\in\mathbb{R}^{p}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT. It is easy to check that πm(span(AATX+AbT))=πm(span(X))=span(ATX)subscript𝜋𝑚span𝐴superscript𝐴𝑇𝑋subscript𝐴perpendicular-tosuperscript𝑏𝑇subscript𝜋𝑚span𝑋spansuperscript𝐴𝑇𝑋\pi_{m}(\operatorname{span}(AA^{T}X+A_{\perp}b^{T}))=\pi_{m}(\operatorname{% span}(X))=\operatorname{span}(A^{T}X)italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( roman_span ( italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X + italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ) = italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( roman_span ( italic_X ) ) = roman_span ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X ). Hence a natural coordinate for the tangent vector Δ=AbTΔsubscript𝐴perpendicular-tosuperscript𝑏𝑇\Delta=A_{\perp}b^{T}roman_Δ = italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is bp𝑏superscript𝑝b\in\mathbb{R}^{p}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, and the geodesic direction from 𝒳~~𝒳\widetilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG to 𝒳𝒳\mathcal{X}caligraphic_X would be V=XTA𝑉superscript𝑋𝑇subscript𝐴perpendicular-toV=X^{T}A_{\perp}italic_V = italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT. It is easy to see that VF=1subscriptnorm𝑉𝐹1\|V\|_{F}=1∥ italic_V ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = 1 since X𝑋Xitalic_X has orthonormal columns. To reflect the distance between 𝒳~~𝒳\widetilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG and 𝒳𝒳\mathcal{X}caligraphic_X, i.e. the reconstruction error, we define d(𝒳~,𝒳)V𝑑~𝒳𝒳𝑉d(\widetilde{\mathcal{X}},\mathcal{X})Vitalic_d ( over~ start_ARG caligraphic_X end_ARG , caligraphic_X ) italic_V as the score vector for 𝒳𝒳\mathcal{X}caligraphic_X associated with πmsubscript𝜋𝑚\pi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. In the case of Gr(1,2)NGMGr12NGM\text{Gr}(1,2)\to\text{NGM}Gr ( 1 , 2 ) → NGM, we use the sign distance to the NGM as the score. For complex nested Grassmanns however, the principal score associated with each projection is a p𝑝pitalic_p-dimensional complex vector. For the sake of visualization, we transform this p𝑝pitalic_p-dimensional complex vector to a 2p2𝑝2p2 italic_p-dimensional real vector. The procedure for computing the PNG and the principal scores is summarized in Algorithm 1.

Remark 2

Note that this definition of principal score is not intrinsic as it depends on the choice of basis. Indeed, it is impossible to choose a p𝑝pitalic_p-dimensional vector for the projection πmsubscript𝜋𝑚\pi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in an intrinsic way, since the only property of a map that is independent of bases is the rank of the map. A reasonable choice of basis is made by viewing the Grassmann 𝐺𝑟(p,m)𝐺𝑟𝑝𝑚\text{Gr}(p,m)Gr ( italic_p , italic_m ) as a quotient manifold of 𝑆𝑡(p,m)𝑆𝑡𝑝𝑚\text{St}(p,m)St ( italic_p , italic_m ), which is a submanifold in m×psuperscript𝑚𝑝\mathbb{R}^{m\times p}blackboard_R start_POSTSUPERSCRIPT italic_m × italic_p end_POSTSUPERSCRIPT. This is how we define the principal score for nested Grassmanns.

Input : 𝒳1,𝒳NGr(p,n)subscript𝒳1subscript𝒳𝑁Gr𝑝𝑛\mathcal{X}_{1},\ldots\mathcal{X}_{N}\in\text{Gr}(p,n)caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ Gr ( italic_p , italic_n )
Output : An N×p(np)𝑁𝑝𝑛𝑝N\times p(n-p)italic_N × italic_p ( italic_n - italic_p ) score matrix.
Let 𝒳=(𝒳1,,𝒳N)𝒳subscript𝒳1subscript𝒳𝑁\mathcal{X}=(\mathcal{X}_{1},\ldots,\mathcal{X}_{N})caligraphic_X = ( caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ).
/* 𝙶𝚛(p,n)𝙶𝚛(p,n1)𝙶𝚛(p,p+1)𝙶𝚛𝑝𝑛𝙶𝚛𝑝𝑛1𝙶𝚛𝑝𝑝1\text{Gr}(p,n)\to\text{Gr}(p,n-1)\to\cdots\to\text{Gr}(p,p+1)Gr ( italic_p , italic_n ) → Gr ( italic_p , italic_n - 1 ) → ⋯ → Gr ( italic_p , italic_p + 1 ) */
for m=n1,,p+1𝑚𝑛1normal-…𝑝1m=n-1,\ldots,p+1italic_m = italic_n - 1 , … , italic_p + 1 do
       v^,b^=argminv,bi=1Nd2(𝒳i,span((IvvT)Xi+vbT))^𝑣^𝑏subscriptargmin𝑣𝑏subscriptsuperscript𝑁𝑖1superscript𝑑2subscript𝒳𝑖span𝐼𝑣superscript𝑣𝑇subscript𝑋𝑖𝑣superscript𝑏𝑇\hat{v},\hat{b}=\operatorname*{arg\,min}_{v,b}\sum^{N}_{i=1}d^{2}(\mathcal{X}_% {i},\operatorname{span}((I-vv^{T})X_{i}+vb^{T}))over^ start_ARG italic_v end_ARG , over^ start_ARG italic_b end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_v , italic_b end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_span ( ( italic_I - italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) );
       /* vm+1𝑣superscript𝑚1v\in\mathbb{R}^{m+1}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT, v=1norm𝑣1\|v\|=1∥ italic_v ∥ = 1, bp𝑏superscript𝑝b\in\mathbb{R}^{p}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT */
       for i=1,,N𝑖1normal-…𝑁i=1,\ldots,Nitalic_i = 1 , … , italic_N do
             Let 𝒳^i=span((Iv^v^T)Xi+v^b^T)subscript^𝒳𝑖span𝐼^𝑣superscript^𝑣𝑇subscript𝑋𝑖^𝑣superscript^𝑏𝑇\widehat{\mathcal{X}}_{i}=\operatorname{span}((I-\hat{v}\hat{v}^{T})X_{i}+\hat% {v}\hat{b}^{T})over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_span ( ( italic_I - over^ start_ARG italic_v end_ARG over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_v end_ARG over^ start_ARG italic_b end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) where 𝒳i=span(Xi)subscript𝒳𝑖spansubscript𝑋𝑖\mathcal{X}_{i}=\operatorname{span}(X_{i})caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_span ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).
             Vi,m=dg(𝒳i,𝒳^i)XiTvsubscript𝑉𝑖𝑚subscript𝑑𝑔subscript𝒳𝑖subscript^𝒳𝑖superscriptsubscript𝑋𝑖𝑇𝑣V_{i,m}=d_{g}(\mathcal{X}_{i},\widehat{\mathcal{X}}_{i})X_{i}^{T}vitalic_V start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_v ;
              // the score vector for 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
             𝒳ispan(RTXi)subscript𝒳𝑖spansuperscript𝑅𝑇subscript𝑋𝑖\mathcal{X}_{i}\leftarrow\operatorname{span}(R^{T}X_{i})caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← roman_span ( italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ;
              // R𝚂𝚝(m,m+1)𝑅𝚂𝚝𝑚𝑚1R\in\text{St}(m,m+1)italic_R ∈ St ( italic_m , italic_m + 1 ) such that RTv^=0superscript𝑅𝑇normal-^𝑣0R^{T}\hat{v}=0italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_v end_ARG = 0.
            
      
if p>1𝑝1p>1italic_p > 1 then
       for i=1,,N𝑖1normal-…𝑁i=1,\ldots,Nitalic_i = 1 , … , italic_N do
             𝒳i𝒳isubscript𝒳𝑖superscriptsubscript𝒳𝑖perpendicular-to\mathcal{X}_{i}\leftarrow\mathcal{X}_{i}^{\perp}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ;
              // 𝙶𝚛(p,p+1)𝙶𝚛(1,p+1)𝙶𝚛𝑝𝑝1𝙶𝚛1𝑝1\text{Gr}(p,p+1)\cong\text{Gr}(1,p+1)Gr ( italic_p , italic_p + 1 ) ≅ Gr ( 1 , italic_p + 1 )
            
      /* 𝙶𝚛(1,p+1)𝙶𝚛(1,p)𝙶𝚛(1,2)𝙶𝚛1𝑝1𝙶𝚛1𝑝𝙶𝚛12\text{Gr}(1,p+1)\to\text{Gr}(1,p)\to\cdots\to\text{Gr}(1,2)Gr ( 1 , italic_p + 1 ) → Gr ( 1 , italic_p ) → ⋯ → Gr ( 1 , 2 ) */
       for m=p,,2𝑚𝑝normal-…2m=p,\ldots,2italic_m = italic_p , … , 2 do
             v^,b^=argminv,bi=1Nd2(𝒳i,span((IvvT)Xi+vbT))^𝑣^𝑏subscriptargmin𝑣𝑏subscriptsuperscript𝑁𝑖1superscript𝑑2subscript𝒳𝑖span𝐼𝑣superscript𝑣𝑇subscript𝑋𝑖𝑣superscript𝑏𝑇\hat{v},\hat{b}=\operatorname*{arg\,min}_{v,b}\sum^{N}_{i=1}d^{2}(\mathcal{X}_% {i},\operatorname{span}((I-vv^{T})X_{i}+vb^{T}))over^ start_ARG italic_v end_ARG , over^ start_ARG italic_b end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_v , italic_b end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_span ( ( italic_I - italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_v italic_b start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) );
             /* vm+1𝑣superscript𝑚1v\in\mathbb{R}^{m+1}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT, v=1norm𝑣1\|v\|=1∥ italic_v ∥ = 1, bp𝑏superscript𝑝b\in\mathbb{R}^{p}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT */
             for i=1,,N𝑖1normal-…𝑁i=1,\ldots,Nitalic_i = 1 , … , italic_N do
                   Let 𝒳^i=span((Iv^v^T)Xi+v^b^T)subscript^𝒳𝑖span𝐼^𝑣superscript^𝑣𝑇subscript𝑋𝑖^𝑣superscript^𝑏𝑇\widehat{\mathcal{X}}_{i}=\operatorname{span}((I-\hat{v}\hat{v}^{T})X_{i}+\hat% {v}\hat{b}^{T})over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_span ( ( italic_I - over^ start_ARG italic_v end_ARG over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_v end_ARG over^ start_ARG italic_b end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) where 𝒳i=span(Xi)subscript𝒳𝑖spansubscript𝑋𝑖\mathcal{X}_{i}=\operatorname{span}(X_{i})caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_span ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).
                   Vi,m=dg(𝒳i,𝒳^i)XiTvsubscript𝑉𝑖𝑚subscript𝑑𝑔subscript𝒳𝑖subscript^𝒳𝑖superscriptsubscript𝑋𝑖𝑇𝑣V_{i,m}=d_{g}(\mathcal{X}_{i},\widehat{\mathcal{X}}_{i})X_{i}^{T}vitalic_V start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_v ;
                    // the score vector for 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
                   𝒳ispan(RTXi)subscript𝒳𝑖spansuperscript𝑅𝑇subscript𝑋𝑖\mathcal{X}_{i}\leftarrow\operatorname{span}(R^{T}X_{i})caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← roman_span ( italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ;
                    // R𝚂𝚝(m,m+1)𝑅𝚂𝚝𝑚𝑚1R\in\text{St}(m,m+1)italic_R ∈ St ( italic_m , italic_m + 1 ) such that RTv=0superscript𝑅𝑇𝑣0R^{T}v=0italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_v = 0.
                  
            
      
/* 𝙶𝚛(1,2)𝙽𝙶𝙼𝙶𝚛12𝙽𝙶𝙼\text{Gr}(1,2)\to\text{NGM}Gr ( 1 , 2 ) → NGM */
/* Note that 𝒳^i𝙶𝚛(1,2)subscriptnormal-^𝒳𝑖𝙶𝚛12\widehat{\mathcal{X}}_{i}\in\text{Gr}(1,2)over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ Gr ( 1 , 2 ) */
NGMFM(𝒳^1,,𝒳^N)NGMFMsubscript^𝒳1subscript^𝒳𝑁\text{NGM}\leftarrow\text{FM}(\widehat{\mathcal{X}}_{1},\ldots,\widehat{% \mathcal{X}}_{N})NGM ← FM ( over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT );
Let Ui2subscript𝑈𝑖superscript2U_{i}\in\mathbb{R}^{2}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT be the geodesic direction from the NGM to 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that Ui=dg(NGM,𝒳i)normsubscript𝑈𝑖subscript𝑑𝑔NGMsubscript𝒳𝑖\|U_{i}\|=d_{g}(\text{NGM},\mathcal{X}_{i})∥ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = italic_d start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( NGM , caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).;
for i=1,,N𝑖1normal-…𝑁i=1,\ldots,Nitalic_i = 1 , … , italic_N do
       Vi,1subscript𝑉𝑖1V_{i,1}\in\mathbb{R}italic_V start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT ∈ blackboard_R is such that Ui=Vi,1×U1U1subscript𝑈𝑖subscript𝑉𝑖1subscript𝑈1normsubscript𝑈1U_{i}=V_{i,1}\times\frac{U_{1}}{\|U_{1}\|}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT × divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ end_ARG;
      
Return the score matrix S=[Vi,m]𝑆delimited-[]subscript𝑉𝑖𝑚S=[V_{i,m}]italic_S = [ italic_V start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT ]. (Note that Vi,msubscript𝑉𝑖𝑚V_{i,m}\in\mathbb{R}italic_V start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT ∈ blackboard_R for m=1,,p𝑚1𝑝m=1,\ldots,pitalic_m = 1 , … , italic_p and Vi,mpsubscript𝑉𝑖𝑚superscript𝑝V_{i,m}\in\mathbb{R}^{p}italic_V start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT for m=p+1,,n1𝑚𝑝1𝑛1m=p+1,\ldots,n-1italic_m = italic_p + 1 , … , italic_n - 1.)
Algorithm 1 Principal Nested Grassmanns

5 Experiments

In this section, we will demonstrate the performance of the proposed dimensionality reduction technique, i.e. PNG and sPNG, via experiments on synthetic and real data. The implementation111Our code is available at https://github.com/cvgmi/NestedGrassmann. is based on the python library pymanopt (Townsend et al., 2016) and we use the steepest descent algorithm for the optimization (with default parameters in pymanopt). The optimization was performed on a desktop with 3.6GHz Intel i7 processors and took about 30 seconds to converge.

5.1 Synthetic Data

In this subsection, we compare the performance of the projection and the geodesic distances respectively. The questions we will answer are the following. (1) From Section 4.3, we see that using projection distance is more efficient than using the geodesic distance. But how do they perform compared to each other under varying dimension n𝑛nitalic_n and variance level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT? (2) Is our method of dimensionality reduction ”better” than PGA? Under what conditions does our method outperform PGA?

5.1.1 Projection and Geodesic Distance Comparisons

The procedure we used to generate random points on Gr(p,n)Gr𝑝𝑛\text{Gr}(p,n)Gr ( italic_p , italic_n ) for the synthetic data experiments is as follows: First, we generate N𝑁Nitalic_N points from a uniform distribution on St(p,m)St𝑝𝑚\text{St}(p,m)St ( italic_p , italic_m ) (Chikuse, 2003, Ch. 2.5), generate A𝐴Aitalic_A from the uniform distribution on St(m,n)St𝑚𝑛\text{St}(m,n)St ( italic_m , italic_n ), and generate B𝐵Bitalic_B as an n×p𝑛𝑝n\times pitalic_n × italic_p matrix with i.i.d entries from N(0,0.1)𝑁00.1N(0,0.1)italic_N ( 0 , 0.1 ). Then we compute 𝒳~i=span(AXi+(IAAT)B)Gr(p,n)subscript~𝒳𝑖span𝐴subscript𝑋𝑖𝐼𝐴superscript𝐴𝑇𝐵Gr𝑝𝑛\widetilde{\mathcal{X}}_{i}=\operatorname{span}(AX_{i}+(I-AA^{T})B)\in\text{Gr% }(p,n)over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_span ( italic_A italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_I - italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_B ) ∈ Gr ( italic_p , italic_n ). Finally, we compute 𝒳i=Exp𝒳~i(σUi)subscript𝒳𝑖subscriptExpsubscript~𝒳𝑖𝜎subscript𝑈𝑖\mathcal{X}_{i}=\text{Exp}_{\widetilde{\mathcal{X}}_{i}}(\sigma U_{i})caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = Exp start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where Ui=U~i/U~isubscript𝑈𝑖subscript~𝑈𝑖normsubscript~𝑈𝑖U_{i}=\widetilde{U}_{i}/\|\widetilde{U}_{i}\|italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / ∥ over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ and U~iT𝒳~iGr(p,n)subscript~𝑈𝑖subscript𝑇subscript~𝒳𝑖Gr𝑝𝑛\widetilde{U}_{i}\in T_{\widetilde{\mathcal{X}}_{i}}\text{Gr}(p,n)over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_T start_POSTSUBSCRIPT over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT Gr ( italic_p , italic_n ), to include some perturbation.

This experiment involves comparing the performance of the NG representation in terms of the explained variance, under different levels of data variance. In this experiment, we set N=50𝑁50N=50italic_N = 50, n=10𝑛10n=10italic_n = 10, m=3𝑚3m=3italic_m = 3, and p=1𝑝1p=1italic_p = 1 and σ𝜎\sigmaitalic_σ is ranging from 0.5 to 1. The results are averaged over 100100100100 repetitions and are shown in Figure 4. From these results, we can see that the explained variance for the projection distance and the geodesic distance are indistinguishable but using projection distance leads to much faster convergence than when using the geodesic distance. The reason is that when two points on the Grassmann manifold are close, the geodesic distance can be well-approximated by the projection distance. When the algorithm converges, the original point 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the reconstructed point 𝒳^isubscript^𝒳𝑖\hat{\mathcal{X}}_{i}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT should be close and the geodesic distance can thus be well-approximated by the projection distance. Therefore, for all the experiments in the next section, we use the projection distance for the sake of efficiency.

Refer to caption
Figure 4: Comparison of the NG representations based on the projection and geodesic distances using the expressed variance.

5.1.2 Comparison of PNG and PGA

Now we compare the proposed PNG to PGA. In order to have a fair comparison between PNG and PGA, we define the principal components of PNG as the principal components of the scores obtained as in Section 4.5. Similar to the previous experiment, we set N=50𝑁50N=50italic_N = 50, n=10𝑛10n=10italic_n = 10, m=5𝑚5m=5italic_m = 5, p=2𝑝2p=2italic_p = 2, and σ=0.01,0.05,0.1,0.5𝜎0.010.050.10.5\sigma=0.01,0.05,0.1,0.5italic_σ = 0.01 , 0.05 , 0.1 , 0.5 and apply the same procedure to generate synthetic data. The results are averaged over 100100100100 repetitions and are shown in Figure 5.

Refer to caption
Figure 5: Comparison of PNG and PGA under different levels of noise.

From Figure 5, we can see that our method outperforms PGA by virtue of the fact that it is able to capture a larger amount of variance contained in the data. We can see that when the variance is small, the improvement of PNG over PGA is less significant, whereas, our method is significantly better for the large data variance case (e.g. comparing σ=0.5𝜎0.5\sigma=0.5italic_σ = 0.5 and σ=0.01𝜎0.01\sigma=0.01italic_σ = 0.01). Note that when the variance in the data is small, i.e. the data are tightly clustered around the FM, and PGA captures the essence of the data well. However, the requirement in PGA on the geodesic submanifold to pass through the anchor point, namely the FM, is not meaningful for data with large variance as explained through the following simple example. Consider, a few data points spread out on the equator of a sphere. The FM in this case is likely to be the north pole of the sphere if we restrict ourselves to the upper hemisphere. Thus, the geodesic submanifold computed by PGA will pass through this FM. However, what is more meaningful is a submanifold corresponding to the equator, which is what a nested spheres representation (Jung et al., 2012) in this case yields. In similar vein, for data with large variance on a Grassmann manifold, our NG representation will yield a more meaningful representation than PGA.

5.2 Application to Planar Shape Analysis

We now apply our method to planar (2-dimensional) shape analysis. A planar shape σ𝜎\sigmaitalic_σ can be represented as an ordered set of k>2𝑘2k>2italic_k > 2 points in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, called a k𝑘kitalic_k-ad or a configuration. Here we assume that these k𝑘kitalic_k points are not all identical. Denote the configuration by X𝑋Xitalic_X which is a k×2𝑘2k\times 2italic_k × 2 matrix. Let H𝐻Hitalic_H be the (k1)×k𝑘1𝑘(k-1)\times k( italic_k - 1 ) × italic_k Helmert submatrix (Dryden and Mardia, 2016, Ch. 2.5). Then Z=HX/HXF𝑍𝐻𝑋subscriptnorm𝐻𝑋𝐹Z=HX/\|HX\|_{F}italic_Z = italic_H italic_X / ∥ italic_H italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is called the pre-shape of X𝑋Xitalic_X from which the information about translation and scaling is removed. The space of all pre-shapes is called the pre-shape space, denoted by 𝒮2ksubscriptsuperscript𝒮𝑘2\mathcal{S}^{k}_{2}caligraphic_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By definition, the pre-shape space is a (2k3)2𝑘3(2k-3)( 2 italic_k - 3 )-dimensional sphere. The shape is obtained by removing the rotation from the pre-shape, and thus the shape space is Σ2k=𝒮2k/O(2)subscriptsuperscriptΣ𝑘2subscriptsuperscript𝒮𝑘2O2\Sigma^{k}_{2}=\mathcal{S}^{k}_{2}/\text{O}(2)roman_Σ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / O ( 2 ). It was shown by Kendall (1984) that Σ2ksubscriptsuperscriptΣ𝑘2\Sigma^{k}_{2}roman_Σ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a smooth manifold and, when equipped with the quotient metric, is isometric to the complex projective space Pk2superscript𝑃𝑘2\mathbb{C}P^{k-2}blackboard_C italic_P start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT equipped with the Fubini-Study metric (up to a scale factor) which is a special case of the complex Grassmannians, i.e. Pk2Gr(1,k1)superscript𝑃𝑘2Gr1superscript𝑘1\mathbb{C}P^{k-2}\cong\text{Gr}(1,\mathbb{C}^{k-1})blackboard_C italic_P start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ≅ Gr ( 1 , blackboard_C start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ). Hence, we can apply the proposed PNG to planar shapes. For planar shapes, we also compare with the recently proposed principal nested shape spaces (PNSS) (Dryden et al., 2019), which is an application of PNS on the pre-shape space. We will now demonstrate how the PNG performs compared to PGA and PNSS using some simple examples of planar shapes and the OASIS dataset.

Examples of Planar Shapes

We take three datasets: digit3, gorf, and gorm, from the R package shapes (Dryden, 2021). The digit3 dataset consists of 30 shapes of the digit 3, each of which is represented by 13 points in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT; the gorf dataset consists of 30 shapes of female gorilla skull, each of which is represented by 8 points in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT; the gorm dataset consists of 29 shapes of male gorilla skull, each of which is represented by 8 points in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Example shapes from these three datasets are shown in Figure 6. The cumulative ratios of variance explained by the first 5 principal components222Here the principal components in PNG and PGA are complex whereas the principal components in PNSS are real. Hence, we choose 5 principal components in PNG and PGA and 10 principal components in PNSS, so that the reduced (real) dimension is 10 in all three cases. of PNG, PGA, and PNSS are shown in Figure 7. It can be seen from Figure 7 that the proposed PNG achieves higher explained variance than PGA and PNSS respectively in most cases.

Refer to caption
(a) digit3
Refer to caption
(b) gorf
Refer to caption
(c) gorm
Figure 6: Example shapes from the three datasets.
Refer to caption
Figure 7: Cumulative explained variance by the first 5 principal components of PNG, PGA, and PNSS.
OASIS Corpus Callosum Data Experiment

The OASIS database (Marcus et al., 2007) is a publicly available database that contains T1-MR brain scans of subjects of age ranging from 18 to 96. In particular, it includes subjects that are clinically diagnosed with mild to moderate Alzheimer’s disease. We further classify them into three groups: young (aged between 10 and 40), middle-aged (aged between 40 and 70), and old (aged above 70). For demonstration, we randomly choose 4 brain scans within each decade, totalling 36 brain scans. From each scan, the Corpus Callosum (CC) region is segmented and 250 points are taken on the boundary of the CC region. See Figure 8 for samples of the segmented corpus callosi. In this case, the shape space is Σ2248P248𝖦𝗋(1,249)subscriptsuperscriptΣ2482superscript𝑃248𝖦𝗋1superscript249\Sigma^{248}_{2}\cong\mathbb{C}P^{248}\cong\mathsf{Gr}(1,\mathbb{C}^{249})roman_Σ start_POSTSUPERSCRIPT 248 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≅ blackboard_C italic_P start_POSTSUPERSCRIPT 248 end_POSTSUPERSCRIPT ≅ sansserif_Gr ( 1 , blackboard_C start_POSTSUPERSCRIPT 249 end_POSTSUPERSCRIPT ). Results of application of the three methods to this data are shown in Figure 9.

Refer to caption
Figure 8: Example Corpus Callosi shapes from three distinct age groups, each depicted using the boundary point sets.
Refer to caption
Figure 9: Cumulative explained variance captured by the first 10 principal components of PNG, PGA, and PNSS respectively.

Since the data are divided into three groups (young, middle-aged, and old), we can apply the sPNG described in Section 4.2 to reduce the dimension. The purpose of this experiment is not to demonstrate state-of-the-art classification accuracy for this dataset. Instead, our goal here is to demonstrate that the proposed nested Grassmann representation in a supervised setting is much more discriminative than the competition, namely the supervised PGA. Hence, we choose a simple classifier such as the support vector machine (SVM) Vapnik (1995) to highlight the aforementioned discriminative power of the nested Grassmann over PGA.

For comparison, the PGA can be easily extended to supervised PGA (sPGA) by first diffeomorphically mapping all the data to the tangent space anchored at the FM and then performing supervised PCA Bair et al. (2006); Barshan et al. (2011) on the tangent space. However, generalizing PNSS to the supervised case is nontrivial and is beyond the scope of this paper. Therefore, we limit our comparison to the unsupervised PNSS. In this demonstration, we apply an SVM on the scores obtained from different dimension reduction algorithms, and we choose only the first three principal scores to show that even with the 3-dimensional representation of the original shapes, we can still achieve good classification results. The results are shown in Table 1. These results are in accordance with our expectation since in sPNG, we seek a projection that minimizes the within-group variance while maximizing the between-group variance. However, as we observed earlier, the constraint of requiring the geodesic submanifold to pass through the FM is not well suited for this dataset which has a large variance across the data. This accounts for why the sPNG exhibits far superior performance compared to sPGA in accuracy.

MethodAccuracy
sPNG83.33%
PNG75%
sPGA66.67%
PGA63.89%
PNSS80.56%
Table 1: Classification accuracies for sPGA and sPNG respectively.

6 Conclusion

In this work, we proposed a novel nested geometric structure for homogeneous spaces and used this structure to achieve dimensionality reduction for data residing in Grassmann manifolds. We also discuss how this nested geometric structure served as a natural generalization of other existing nested geometric structures in literature namely, spheres and the manifold of SPD matrices. Specifically, we showed that a lower dimensional Grassmann manifold can be embedded into a higher dimensional Grassmann manifold and via this embedding we constructed a sequence of nested Grassmann manifolds. Compared to the PGA, which is designed for general Riemannian manifolds, the proposed method can capture a higher percentage of data variance after reducing the dimensionality. This is primarily because our method, unlike the PGA, does not require the submanifold to be a geodesic submanifold and to pass through the Fréchet mean of the data. Succinctly, the nested Grassmann structure allows us to fit the data to a larger class of submanifolds than PGA. We also proposed a supervised dimensionality reduction technique which simultaneously differentiates data classes while reducing dimensionality. Efficacy of our method was demonstrated on the OASIS Corpus Callosi data for dimensionality reduction and classification. We showed that our method outperforms the widely used PGA and the recently proposed PNSS by a large margin.


Acknowledgments

This research was in part funded by the NSF grant IIS-1724174, the NIH NINDS and NIA via R01NS121099 to Vemuri and the MOST grant 110-2118-M-002-005-MY3 to Yang.


Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.


Conflicts of Interest

We declare we don’t have conflicts of interest.

References

  • Absil et al. (2004) P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica, 80(2):199–220, 2004.
  • Bair et al. (2006) Eric Bair, Trevor Hastie, Debashis Paul, and Robert Tibshirani. Prediction by supervised principal components. Journal of the American Statistical Association, 101(473):119–137, 2006.
  • Banerjee et al. (2017) Monami Banerjee, Rudrasis Chakraborty, and Baba C Vemuri. Sparse exact PGA on Riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision, pages 5010–5018, 2017.
  • Barshan et al. (2011) Elnaz Barshan, Ali Ghodsi, Zohreh Azimifar, and Mansoor Zolghadri Jahromi. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 44(7):1357–1371, 2011.
  • Basser et al. (1994) Peter J Basser, James Mattiello, and Denis LeBihan. MR diffusion tensor spectroscopy and imaging. Biophysical journal, 66(1):259–267, 1994.
  • Callaghan (1993) Paul T Callaghan. Principles of Nuclear Magnetic Resonance Microscopy. Oxford University Press on Demand, 1993.
  • Chakraborty et al. (2016) Rudrasis Chakraborty, Dohyung Seo, and Baba C Vemuri. An efficient exact-PGA algorithm for constant curvature manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3976–3984, 2016.
  • Cheeger and Ebin (1975) Jeff Cheeger and David Gregory Ebin. Comparison theorems in Riemannian geometry, volume 9. North-holland Amsterdam, 1975.
  • Chikuse (2003) Yasuko Chikuse. Statistics on Special Manifolds, volume 174. Springer Science & Business Media, 2003.
  • Damon and Marron (2014) James Damon and JS Marron. Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1-2):107–114, 2014.
  • Dryden (2021) I. L. Dryden. shapes package. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL http://www.R-project.org. Contributed package, Version 1.2.6.
  • Dryden and Mardia (2016) Ian L Dryden and Kanti V Mardia. Statistical shape analysis: with applications in R, volume 995. John Wiley & Sons, 2016.
  • Dryden et al. (2019) Ian L Dryden, Kwang-Rae Kim, Charles A Laughton, Huiling Le, et al. Principal nested shape space analysis of molecular dynamics data. Annals of Applied Statistics, 13(4):2213–2234, 2019.
  • Edelman et al. (1998) Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
  • Feragen et al. (2014) Aasa Feragen, Mads Nielsen, Eva Bjørn Vedel Jensen, Andrew du Plessis, and François Lauze. Geometry and statistics: Manifolds and stratified spaces. Journal of Mathematical Imaging and Vision, 50(1):1–4, 2014.
  • Fletcher et al. (2004) P Thomas Fletcher, Conglin Lu, Stephen M Pizer, and Sarang Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8):995–1005, 2004.
  • Goresky and MacPherson (1988) Mark Goresky and Robert MacPherson. Stratified Morse Theory. Springer, 1988.
  • Harandi et al. (2018) Mehrtash Harandi, Mathieu Salzmann, and Richard Hartley. Dimensionality reduction on SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1):48–62, 2018.
  • Hastie and Stuetzle (1989) Trevor Hastie and Werner Stuetzle. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989.
  • Hauberg (2016) Søren Hauberg. Principal curves on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1915–1921, 2016.
  • Helgason (1979) Sigurdur Helgason. Differential geometry, Lie groups, and symmetric spaces. Academic press, 1979.
  • Huckemann et al. (2010) Stephan Huckemann, Thomas Hotz, and Axel Munk. Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica, pages 1–58, 2010.
  • Jaquier and Rozo (2020) Noémie Jaquier and Leonel Rozo. High-dimensional bayesian optimization via nested riemannian manifolds. Advances in Neural Information Processing Systems, 33:20939–20951, 2020.
  • Jung et al. (2012) Sungkyu Jung, Ian L Dryden, and JS Marron. Analysis of principal nested spheres. Biometrika, 99(3):551–568, 2012.
  • Kendall (1984) David G Kendall. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984.
  • Marcus et al. (2007) Daniel S Marcus, Tracy H Wang, Jamie Parker, John G Csernansky, John C Morris, and Randy L Buckner. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience, 19(9):1498–1507, 2007.
  • Pennec et al. (2018) Xavier Pennec et al. Barycentric subspace analysis on manifolds. The Annals of Statistics, 46(6A):2711–2746, 2018.
  • Sommer et al. (2010) Stefan Sommer, François Lauze, Søren Hauberg, and Mads Nielsen. Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In European Conference on Computer Vision, pages 43–56. Springer, 2010.
  • Townsend et al. (2016) James Townsend, Niklas Koep, and Sebastian Weichwald. Pymanopt: A python toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17(137):1–5, 2016. URL http://jmlr.org/papers/v17/16-177.html.
  • Vapnik (1995) Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer Science & Business Media, 1995.
  • Ye and Lim (2016) Ke Ye and Lek-Heng Lim. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197, 2016.
  • Zhang and Fletcher (2013) Miaomiao Zhang and Tom Fletcher. Probabilistic principal geodesic analysis. In Advances in Neural Information Processing Systems, pages 1178–1186, 2013.