3D-only avatars are staples of the Metaverse and beyond. Avatar creation software should simplify data collection, calculate it quickly, and render a 3D image of the user that is photorealistic, animated, and can be lit differently. Unfortunately, current methods need to be revised to meet these requirements. Recent techniques for generating 3D avatars from movies use implicit neural representations or 3D morphable models (3DMM). The latter can easily simulate people with complex haircuts or glasses, since model meshes have a priori defined topologies and are only capable of surface-like geometries. The former, however, are efficient rasterizers and inherently generalize to invisible deformations. Recently, 3D modeling of the head has also been performed using implicit neural representations.
However, since drawing a single pixel requires interrogating multiple locations along the camera radius, they are much less efficient for training and rendering than 3DMM-based approaches for capturing hair strands and glasses. Moreover, it is difficult to change implicit representations in a generalizable way, forcing existing methods into an inefficient root-finding loop, severely affecting training and testing times. To address these issues, they provide PointAvatar, a unique avatar representation that learns a continuous strain field for animation and uses point clouds to describe canonical geometry. To be more specific, they enhance an oriented point cloud to describe the geometry of a subject in canonical space.
Table 1: Since PointAvatar renders and distorts efficiently, it allows simple rendering of entire images during training. It can also work with thin, flexible materials and recreate precise surface normals in regions that look like surfaces, such as skin.
Discover Hailo-8™: an artificial intelligence processor that uses computer vision for multi-camera and multi-person re-identification (sponsored)
In Table 1, they list the advantages of their point representation. Given the expression and pose parameters of a pre-trained 3DMM, the learned deformation field translates the canonical points in the deformed space with learned blending shapes and skinning weights for animation. Their point-based format renders more efficiently than implicit representations when using a common differentiable rasterizer. Plus, they’re easily deformed using proven methods like skinning. Points are much more adaptable and flexible than meshes. They can represent complex volume-like objects like fluffy hair in addition to being able to adapt topology to mimic accessories like glasses. The ability of their approach to disentangle lighting effects is one of its advantages.
They separate transparent color into intrinsic albedo and normal-dependent shading based on monocular video shot under general lighting (see Fig. 1). Due to the discrete point structure, it is difficult and expensive to correctly calculate normals from point clouds, and quality can quickly decline due to noise and inadequate or irregular sampling. Therefore, they provide two methods to (a) reliably and accurately obtain normals from canonical learned points and (b) transform point normals with non-rigid surface deformation while maintaining geometric characteristics. In the first case, they take advantage of the low-frequency bias of MLPs and estimate the normals by fitting a smooth signed distance function (SDF) to the points; in the second case, they take advantage of the continuity of the deformation mapping and analytically transform the normals using the Jacobian of the deformation. Both methods provide a high quality normal estimate, propagating the many geometric signals in coloring to improve the point geometry. PointAvatar can be re-ignited and rendered in new scenes with untangled albedo and specific normal directions.
The suggested representation combines the advantages of well-known mesh and implicit models. It outperforms both in many difficult cases, as seen in many films shot with a DSLR, smartphone, laptop or other cameras or downloaded from the internet. In conclusion, their contributions consist of the following:
1. They suggest a new representation for 3D animatable avatars based on an explicit canonical point cloud and continuous deformation, which demonstrates state-of-the-art photo-realism while being significantly more efficient than existing implicit 3D avatar methods
2. They disentangle the RGB color into a pose-independent albedo and a pose-dependent shading component, allowing relighting in new scenes
3. They demonstrate the benefit of their methods on a variety of subjects captured using various
The source code will soon be made available on GitHub for analysis.
Check Paper and Project. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is an intern consultant at MarktechPost. He is currently pursuing his undergraduate studies in Data Science and Artificial Intelligence at Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He enjoys connecting with people and collaborating on interesting projects.