Tensor Framework / TensorFaces

 

Multilinear tensor modeling methods are particularly well suited for mathematically representing cause-and-effect of multi-modal data where an observation, such as an image, is the result of several constituent factors, the causal factors of data formation.

For example, natural images are the compositional consequence of multiple factors related to scene structure, illumination, and imaging. The appearance of a person in an image (i.e. its pixel values) is the result of the facial geometry of a person, camera location/parameters, lighting conditions, expression, etc. While we can directly observe and measure the gray (or color) values in an image, we are often more interested in the information associated with the causal factors that determine the pixel values in an image, such as the person's identity, the viewing direction, or expression, which may be inferred, but not directly measured. The causal factors are represented by the latent variables in a computational model.

Data tensor modeling was first employed in computer vision, computer graphics and machine learning to represent cause-and-effect and demonstratively disentangle the causal factors of observable data and recognize people from the way they move (Human Motion Signatures in 2001) and from their facial images (TensorFaces in 2002), but it may be used to recognize any objects or object attributes. 

There are two classes of data tensor modeling techniques that stem from: (1) rank-K tensor decompositions (CANDECOMP / Parafac decomposition) and (2) rank-(R1,R2,...,RM) tensor decompositions, (Tucker decomposition). Variants on these decompositions employ various constraints. Kernel variants apply a kernel pre-processing step.
 

Recent theoretical evidence shows that deep learning is a neural network equivalent to multilinear tensor decomposition, while a shallow network corresponds to CP tensor factorization (aka, linear tensor factorization).


TensorFaces    is based on the insight that multilinear tensor methods can explicitly model and decompose a facial image in terms of the causal factors of data formation where each causal factor is represented according to their second-order statistics by employing the Tucker tensor decomposition. We refer to this approach more generally as  Mulitlinear PCA in order to better differentiate it from our Multilinear ICA approach.

Multilinear (tensor) ICA     is a more sophisticated model of cause-and-effect  based on the higher-order statistics associated with each causal factor. Similarly, one can employ our kernel variants (pg.43    ) to model cause-and-effect. By comparison, matrix decompositions, such as PCA, or ICA, capture the overall statistical information (variance, kurtosis) without any type of differentiation.

Subspace multilinear learning    demonstratively disentangles the causal factors of data formation through strategic dimensionality reduction.  For example, in the case of facial images (or bi-directional textures functions), we suppress illumination effects such as shadows and highlights without blurring the edges associated with the person's identity that are important fo recognition (or edges associated with structural information that are important for texture synthesis.  See TensorTextures video below. ).

Next important question:While TensorFaces is a handy moniker for an approach that learns and represents the interaction of various causal factors from a set of training images, with Multilinear (Tensor) ICA    and kernel variants as a more sophisticated approaches, none of the interaction models prescribe a solution for how one might determine the multiple causal factors of a single unlabeled test image.

Multilinear Projection (FG 2011   , ICCV 2007   , briefly summarized in the 2005 MICA paper) addresses the question of how one might determine from a single unlabeled test image all the unknown causal factors of data formation, ie how does one solve for multiple unknowns from a single image equation?  In the course of addressing this question, several concepts from linear (matrix) algebra were generalized, such as the mode-m identity tensor (which is also an algebraic operator that reshapes a matrix into a tensor and back again to a matrix),  the mode-m pseudo-inverse tensor, the mode-m product in order to develop the multilinear projection algorithm. (Note: The mode-m pseudo-inverse tensor is not a tensor pseudo-inverse.)  Multilinear projection simultaneously projects one or more unlabeled test images into multiple constituent mode spaces, associated with image formation, in order to infer the mode labels.

  • "Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors ”, M.A.O. Vasilescu, E. Kim, In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’19): Tensor Methods for Emerging Data Science Challenges, August 04-08, 2019, Anchorage, AK. ACM, New York, NY, USA    Paper (pdf)
     

  • "Face Tracking with Multilinear (Tensor) Active Appearance Models", Weiguang Si, Kota Yamaguchi, M. A. O. Vasilescu , June, 2013. 
    http://pdfs.semanticscholar.org/6c64/59d7cadaa210e3310f3167dc181824fb1bff.pdf
    Paper (pdf)
     

  • "Multilinear Projection for Face Recognition via Canonical Decomposition ", M.A.O. Vasilescu, In Proc. Face and Gesture Conf. (FG'11), 476-483. Paper (pdf)
     

  • "Multilinear Projection for Face Recognition via Rank-1 Analysis ", M.A.O. Vasilescu, CVPR, IEEE Computer Society and IEEE Biometrics Council Workshop on Biometrics, June 18, 2010.
     

  • "Multilinear Projection for Appearance-Based Recognition in the Tensor Framework", M.A.O. Vasilescu and D. Terzopoulos, Proc. Eleventh IEEE International Conf. on Computer Vision (ICCV'07), Rio de Janeiro, Brazil, October, 2007, 1-8. 
    Paper (1,027 KB - .pdf) 
     

  • “Multilinear Independent Components Analysis and Multilinear Projection Operator for Face Recognition”, M.A.O. Vasilescu, D. Terzopoulos, in Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France, August 2005.
     

  • "Multilinear (Tensor) ICA and Dimensionality Reduction", M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th International Conference on Independent Component Analysis and Signal Separation (ICA07), London, UK, September, 2007. In Lecture Notes in Computer Science, 4666, Springer-Verlag, New York, 2007, 818–826. 
     

  • "Multilinear Independent Components Analysis", M. A. O. Vasilescu and D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '05), San Diego, CA, June 2005, vol.1, 547-553. 
    Paper (1,027 KB - .pdf) 
     

  • "Multilinear Independent Component Analysis", M. A. O. Vasilescu and D. Terzopoulos, Learning 2004 Snowbird, UT, April, 2004.
     

  • "Multilinear Subspace Analysis for Image Ensembles,'' M. A. O. Vasilescu, D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '03), Vol.2, Madison, WI, June, 2003, 93-99. 
    Paper (1,657KB - .pdf)  
     

  • "Multilinear Image Analysis for Facial Recognition,'' M. A. O. Vasilescu, D. Terzopoulos, Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 2, Quebec City, Canada, Aug, 2002, 511-514. 
    Paper (439KB - .pdf) 
     

  • "Multilinear Analysis of Image Ensembles: TensorFaces," M. A. O. Vasilescu, D. Terzopoulos, Proc. 7th European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002, in Computer Vision -- ECCV 2002, Lecture Notes in Computer Science, Vol. 2350, A. Heyden et al. (Eds.), Springer-Verlag, Berlin, 2002, 447-460. 
    Full Article in PDF (882KB) 
     

 
 
 

Human Motion Signatures and Style Transfer: 

 

Given motion-capture samples of Charlie Chaplin’s walk, is it possible to synthesize other motions (say, ascending or descending stairs) in his distinctive style? More generally, in analogy with handwritten signatures, do people have characteristic motion signatures that individualize their movements? If so, can these signatures be extracted from example motions? Can they be disentangled from other causal factors?

 

We have developed an algorithm that extracts motion signatures and uses them in the animation of graphical characters. The mathematical basis of our algorithm is a statistical numerical technique known as or M-mode data tensor analysis. For example, given a corpus of walking, stair ascending, and stair descending motion data collected over a group of subjects, plus a sample walking motion for a new subject, our algorithm can synthesize never before seen ascending and descending motions in the distinctive style of this new individual.

  • "Human Motion Signatures: Analysis, Synthesis, Recognition," M. A. O. Vasilescu Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 3, Quebec City, Canada, Aug, 2002, 456-460. 
    Paper (439KB - .pdf) 
     

  • "An Algorithm for Extracting Human Motion Signatures", M. A. O. Vasilescu, Computer Vision and Pattern Recognition CVPR 2001 Technical Sketches, Lihue, HI, December, 2001. 
     

  • "Human Motion Signatures for Character Animations", M. A. O. Vasilescu, Sketch and Applications SIGGRAPH 2001 Los Angeles, CA, August, 2001. 
    Sketch (141KB - .pdf)
     

  • "Recognition Action Events from Multiple View Points," Tanveer Sayed-Mahmood, Alex Vasilescu, Saratendu Sethi, in IEEE Workshop on Detection and Recognition of Events in Video, International Conference on Computer Vision (ICCV 2001), Vancuver , Canada, July 8, 2001, 64-72.

 

 

Listening in 3D
 

Head related transfer function (HRTF) characterizes how an individual's anatomy and sound source location impacts an individual's perception of sound.  The size, shape and density of the head, the shape of the ears and ear canal, the distance between the ears, all transform sound by amplifying some frequencies and attenuating others. Learning how sound is perceived is important in:

  • pinpointing the location of sound that is vital for safe navigation in traffic, 

  • achieving a realistic acoustic environment in gaming and home cinema set-ups.
     

To measure an HRTF, one places a loudspeaker at various locations in space and a microphone at the ear.  To recreate an authentic sound experience, slightly differently synthesized sounds are sent to each ear in accordance with a person's HRTF. 

   This is not surround sound which uses multiple speakers to provide a 360 sound.

  • "A Multilinear (Tensor) Framework for HRTF Analysis and Synthesis", G. Grindlay, M.A.O. Vasilescu, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, Hawaii, April, 2007
    Paper (439KB - .pdf) 

 

TensorTextures: Image-based Rendering 

 

One of the goals of computer graphics is photorealistic rendering, the synthesis of images of virtual scenes visually indistinguishable from those of natural scenes. Unlike traditional model-based rendering, whose photorealism is limited by model complexity, an emerging and highly active research area known as
image-based rendering eschews complex geometric models in favor of representing scenes by ensembles of example images. These are used to render novel photoreal images of the scene from arbitrary viewpoints and illuminations, thus decoupling rendering from scene complexity. The challenge is to develop structured representations in high-dimensional image spaces that are rich enough to capture important information for synthesizing new images, including details such as self-occlusion, self-shadowing, interreflections, and subsurface scattering. 

 

TensorTextures, a new image-based texture mapping technique, is a rich generative model that, from a sparse set of example images, learns the interaction between viewpoint, illumination, and geometry that determines detailed surface appearance. Mathematically, TensorTextures is a nonlinear model of texture image ensembles that exploits tensor algebra and the N-mode SVD to learn a representation of the bidirectional texture function (BTF) in which the multiple constituent factors, or modes---viewpoints and illuminations---are disentangled and represented explicitly.
 

  • "TensorTextures: Multilinear Image-Based Rendering", M. A. O. Vasilescu and D. Terzopoulos, Proc. ACM SIGGRAPH 2004 Conference Los Angeles, CA, August, 2004, in Computer Graphics Proceedings, Annual Conference Series, 2004, 336-342. 
    Paper (5,104 KB - .pdf) 

    Animations:

    • TensorTextures - AVI (54,225 KB)

    • TensorTextures Strategic Dimensionality Reduction - AVI (19,650 KB)

    • TensorTextures Trailer - AVI (17,605 KB)

     

  • "TensorTextures", M. A. O. Vasilescu and D. Terzopoulos, Sketches and Applications SIGGRAPH 2003 San Diego, CA, July, 2003. 
    Sketch (6MB - .pdf) 

 

Adaptive Meshes: Physically Based Modeling 

 

Adaptive mesh models for the nonuniform sampling and reconstruction of visual data. Adaptive meshes are dynamic models assembled from nodal masses connected by adjustable springs. Acting as mobile sampling sites, the nodes observe interesting properties of the input data, such as intensities, depths, gradients, and curvatures. The springs automatically adjust their stiffnesses based on the locally sampled information in order to concentrate nodes near rapid variations in the input data. The representational power of an adaptive mesh is enhanced by its ability to optimally distribute the available degrees of freedom of the reconstructed model in accordance with the local complexity of the data.
 

We developed open adaptive mesh and closed adaptive shell surfaces based on triangular or rectangular elements. We propose techniques for hierarchically subdividing polygonal elements in adaptive meshes and shells. We also devise a discontinuity detection and preservation algorithm suitable for the model. Finally, motivated by (nonlinear, continuous dynamics, discrete observation) Kalman filtering theory, we generalize our model to the dynamic recursive estimation of nonrigidly moving surfaces.
 

  • "Adaptive meshes and shells: Irregular triangulation, discontinuities, and hierarchical subdivision," M. Vasilescu, D. Terzopoulos, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '92), Champaign , IL, June, 1992, pages 829 - 832. 
    Paper (652KB - .pdf) 
     

  • "Sampling and Reconstruction with Adaptive Meshes," D. Terzopoulos, M. Vasilescu, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '91), Lahaina, HI, June, 1991, pages 70 - 75. 
    Paper (438KB - .pdf)