All

Mesh Deform 的一些参考

这边从实现角度出发,看可以参考哪些来做。

主要来源:
https://github.com/timzhang642/3D-Machine-Learning

 
 

 
 

Deep learning 处理三维数据,想要实现三角形网格deformation,近期的研究发展如下:

 
 

CNN对于处理图片这样的像素类结构数据是有效的,但是三维网格具有特殊的拓扑结构,不能直接使用用于图片的训练模型结构。因此首先大家想到的是将三维结构类二维化处理,最后得到三维结果。主要方式分为两类:

一种是将三维网格直接拉成2D的平面形式,这个在图形学上面有方法实现,然后再直接处理平面数据;

一种是三维网格体素化或者点云化,然后处理体素,点云;因为体素点云的数据量大,而且其主要有意义的单位都是表示表面的体素或者点云,因此后续的paper都是在优化数据结构来使得可以同时处理更多的体素单元,让用这种方式处理的分辨率更高。

 
 

当然大家最希望的就是直接处理三角形网格,这边的方法思路大致可以分为两类:

一类是训练模型处理的还是图片数据或者体素或者点云,其结果用来使得网格模型做变换;比如最多的就是利用FFD来实现。【FFD

Learning Free-Form Deformations for 3D Object Reconstruction

ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning

Image2mesh: A learning framework for single image 3d reconstruction


另一类是直接改造训练模型,尽量做到端到端的输入输出网格模型。这边的paper主要集中在2018年后,大家都开始发力直接处理三角形网格。【3DN;MeshCNN】

 
 

但是,放弃机器学习这个前提,其实我们这边想要解决的问题更像是变形传递这个图形学的传统问题,因此后续可以先看一下这边是怎么做的。

 
 

 
 

 
 

20190527 新总结

 
 

这边想解决的问题是网格变形,那么不管用不用机器学习方法,怎么用机器学习方法,总归需要一种使得网格产生形变的方法。目前总结可以说是以下五种:

  1. 直接调整各个网格的顶点,使得网格产生形变
  2. 使用 FFD,调整控制点
  3. 使用 as-rigid-as-possible,调整控制点
  4. 使用Blendshape,调整混合参数
  5. 使用骨骼,调整控制参数

 
 

 
 

 
 

 
 

  • MeshNet: Mesh Neural Network for 3D Shape Representation

 
 

【这个东西用在了三维形状分析检索,不在我们的考虑范围内。】

 
 

https://github.com/Yue-Group/MeshNet

 
 

This work will appear in AAAI 2019. We proposed a novel framework (MeshNet) for 3D shape representation, which could learn on mesh data directly and achieve satisfying performance compared with traditional methods based on mesh and representative methods based on other types of data. You can also check out paper for a deeper introduction.这项工作将出现在AAAI 2019中。我们提出了一种用于三维形状表示的新型框架(MeshNet),与基于网格的传统方法和基于其他类型数据的代表性方法相比,它可以直接学习网格数据并获得令人满意的性能。您还可以查看论文以获得更深入的介绍。

Mesh is an important and powerful type of data for 3D shapes. Due to the complexity and irregularity of mesh data, there is little effort on using mesh data for 3D shape representation in recent years. We propose a mesh neural network, named MeshNet, to learn 3D shape representation directly from mesh data. Face-unit and feature splitting are introduced to solve the complexity and irregularity problem. We have applied MeshNet in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation.网格是3D形状的重要且强大的数据类型。由于网格数据的复杂性和不规则性,近年来很少使用网格数据进行3D形状表示。我们提出了一个名为MeshNet的网格神经网络,用于直接从网格数据中学习三维形状表示。引入面单元和特征分裂来解决复杂性和不规则性问题。我们已经将MeshNet应用于3D形状分类和检索的应用中。实验结果与现有技术方法的比较表明,MeshNet可以实现令人满意的三维形状分类和检索性能,表明了该方法对三维形状表示的有效性。

In this repository, we release the code and data for train a Mesh Neural Network for classification and retrieval tasks on ModelNet40 dataset.在此存储库中,我们发布了用于训练网格神经网络的代码和数据,以便在ModelNet40数据集上进行分类和检索任务。

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans github

 
 

【人型数据库】

 
 

http://humanshape.mpi-inf.mpg.de/

 
 

MPII Human Shape is a family of expressive 3D human body shape models and tools for human shape space building, manipulation and evaluation. Human shape spaces are based on the widely used statistical body representation and learned from the CAESAR dataset, the largest commercially available scan database to date. As preprocessing several thousand scans for learning the models is a challenge in itself, we contribute by developing robust best practice solutions for scan alignment that quantitatively lead to the best learned models. We make the models as well as the tools publicly available. Extensive evaluation shows improved accuracy and generality of our new models, as well as superior performance for human body reconstruction from sparse input data.MPII Human Shape是一系列富有表现力的3D人体形状模型和工具,用于人体形状空间构建,操作和评估。 人体形状空间基于广泛使用的统计体表示,并从CAESAR数据集中学习,CAESAR数据集是迄今为止最大的商业可用扫描数据库。 由于预处理数千次扫描用于学习模型本身就是一项挑战,我们通过开发用于扫描对齐的稳健最佳实践解决方案做出贡献,从而定量地获得最佳学习模型。 我们公开提供模型和工具。 广泛的评估显示我们的新模型的准确性和通用性得到提高,以及从稀疏输入数据中进行人体重建的卓越性能。

 
 

 
 

 
 

 
 

 
 

 
 

  • Mesh-based Autoencoders for Localized Deformation Component Analysis (AAAI 2018)
  • Variational Autoencoders for Deforming 3D Mesh Models (CVPR 2018)

 
 

【同作者的两篇,非线性变形的实现,这边没有源代码,引用量还不高】

 
 

 
 

http://qytan.com/

https://github.com/aldehydecho

 
 

 
 

Spatially localized deformation components are very useful for shape analysis and synthesis in 3D geometry processing. Several methods have recently been developed, with an aim to extract intuitive and interpretable deformation components. However, these techniques suffer from fundamental limitations especially for meshes with noise or large-scale deformations, and may not always be able to identify important deformation components. In this paper we propose a novel mesh-based autoencoder architecture that is able to cope with meshes with irregular topology. We introduce sparse regularization in this framework, which along with convolutional operations, helps localize deformations. Our framework is capable of extracting localized deformation components from mesh data sets with large-scale deformations and is robust to noise. It also provides a nonlinear approach to reconstruction of meshes using the extracted basis, which is more effective than the current linear combination approach. Extensive experiments show that our method outperforms state-of-the-art methods in both qualitative and quantitative evaluations.空间局部变形分量对于3D几何处理中的形状分析和合成非常有用。最近开发了几种方法,目的是提取直观和可解释的变形组件。然而,这些技术受到基本限制,特别是对于具有噪声或大规模变形的网格,并且可能不总是能够识别重要的变形部件。在本文中,我们提出了一种新颖的基于网格的自动编码器架构,能够处理具有不规则拓扑的网格。我们在这个框架中引入了稀疏正则化,它与卷积运算一起帮助定位变形。我们的框架能够从具有大规模变形的网格数据集中提取局部变形分量,并且对噪声具有鲁棒性。它还提供了一种使用提取的基础重建网格的非线性方法,这比当前的线性组合方法更有效。大量实验表明,我们的方法在定性和定量评估方面均优于最先进的方法。

 
 

 
 

3D geometric contents are becoming increasingly popular. In this paper, we study the problem of analyzing deforming 3D meshes using deep neural networks. Deforming 3D meshes are flexible to represent 3D animation sequences as well as collections of objects of the same category, allowing diverse shapes with large-scale non-linear deformations. We propose a novel framework which we call mesh variational autoencoders (mesh VAE), to explore the probabilistic latent space of 3D surfaces. The framework is easy to train, and requires very few training examples. We also propose an extended model which allows flexibly adjusting the significance of different latent variables by altering the prior distribution. Extensive experiments demonstrate that our general framework is able to learn a reasonable representation for a collection of deformable shapes, and produce competitive results for a variety of applications, including shape generation, shape interpolation, shape space embedding and shape exploration, outperforming state-of-the-art methods.3D几何内容正变得越来越流行。在本文中,我们研究了使用深度神经网络分析变形三维网格的问题。变形3D网格可以灵活地表示3D动画序列以及相同类别的对象集合,从而允许具有大规模非线性变形的各种形状。我们提出了一种新的框架,我们称之为网格变分自动编码器(网格VAE),以探索三维表面的概率潜在空间。该框架易于培训,只需要很少的培训示例。我们还提出了一个扩展模型,它允许通过改变先验分布灵活地调整不同潜在变量的重要性。大量实验表明,我们的总体框架能够学习一组可变形状的合理表示,并为各种应用产生有竞争力的结果,包括形状生成,形状插值,形状空间嵌入和形状探测,优于状态最先进的方法。

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Exploring Generative 3D Shapes Using Autoencoder Networks 

 
 

【交互式生成,也不是我们核心要参考的文章】

 
 

https://github.com/RaiderSoap/interactive_generative_3d_shapes/

 
 

We propose a new algorithm for converting unstructured triangle meshes into ones with a consistent topology for machine learning applications. We combine the orthogonal depth map computation and the shrink wrapping approach to efficiently and robustly parameterize the triangle geometry regardless of imperfections such as inverted faces, holes, and self-intersections. The converted mesh is consistently and compactly parameterized and thus is suitable for machine learning. We use an autoencoder network to extract the manifold of shapes in the same category to explore and synthesize a variety of shapes. Furthermore, we introduce a direct manipulation interface to navigate the synthesis. We demonstrate our approach with over one thousand car shapes represented in unstructured triangle meshes【我们提出了一种新的算法,用于将非结构化三角网格转换为具有一致拓扑的机器学习应用程序。 我们将正交深度图计算和收缩包装方法相结合,有效且稳健地参数化三角形几何,而不管诸如倒置面,孔和自交叉的缺陷。 转换的网格一致且紧凑地参数化,因此适用于机器学习。 我们使用自动编码器网络来提取同一类别中的多种形状,以探索和合成各种形状。 此外,我们引入了一个直接操作界面来导航综合。 我们展示了我们的方法,在非结构化三角网格中表示了超过一千种汽车形状】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Learning free-form deformations for 3D object reconstruction

 
 

【基于FFD的深度网格变形,一条我们可以参考的路】

 
 

https://github.com/jackd/template_ffd

 
 

Representing 3D shape in deep learning frameworks in an accurate, efficient and compact manner still remains an open challenge. Most existing work addresses this issue by employing voxel-based representations. While these approaches benefit greatly from advances in computer vision by generalizing 2D convolutions to the 3D setting, they also have several considerable drawbacks. The computational complexity of voxel-encodings grows cubically with the resolution thus limiting such representations to low-resolution 3D reconstruction. In an attempt to solve this problem, point cloud representations have been proposed. Although point clouds are more efficient than voxel representations as they only cover surfaces rather than volumes, they do not encode detailed geometric information about relationships between points. In this paper we propose a method to learn free-form deformations (FFD) for the task of 3D reconstruction from a single image. By learning to deform points sampled from a high-quality mesh, our trained model can be used to produce arbitrarily dense point clouds or meshes with fine-grained geometry. We evaluate our proposed framework on both synthetic and real-world data and achieve state-of-the-art results on point-cloud and volumetric metrics. Additionally, we qualitatively demonstrate its applicability to label transferring for 3D semantic segmentation.【以精确,高效和紧凑的方式在深度学习框架中表示3D形状仍然是一个开放的挑战。大多数现有工作通过采用基于体素的表示来解决此问题。虽然这些方法通过将2D卷积推广到3D设置而从计算机视觉的进步中获益很大,但它们也具有几个相当大的缺点。体素编码的计算复杂度随着分辨率而立方体地增长,因此将这种表示限制为低分辨率3D重建。为了解决这个问题,已经提出了点云表示。虽然点云比体素表示更有效,因为它们仅覆盖表面而不是体积,但它们不编码关于点之间关系的详细几何信息。在本文中,我们提出了一种从单个图像中学习三维重建任务的自由变形(FFD)的方法。通过学习从高质量网格中采样的点变形,我们训练的模型可用于生成任意密集的点云或具有细粒度几何的网格。我们评估我们在合成和现实世界数据上提出的框架,并在点云和体积指标上实现最先进的结果。此外,我们定性地证明了其对3D语义分割的标签转移的适用性。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images (ECCV 2018)

 
 

【这一篇基于的变形方法就是deform,值得看一下,这篇是第一个实现生成3D网格模型的文章】

 
 

https://github.com/nywang16/Pixel2Mesh

 
 

We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.

【我们提出了一种端到端的深度学习架构,可以从单个彩色图像中生成三角形网格的3D形状。 受深度神经网络的性质限制,以前的方法通常表示体积或点云中的3D形状,并且将它们转换为更易于使用的网格模型是非常重要的。 与现有方法不同,我们的网络在基于图形的卷积神经网络中表示3D网格,并通过逐渐变形椭球产生正确的几何形状,利用从输入图像中提取的感知特征。 我们采用粗到细的策略使整个变形过程稳定,并定义各种网格相关的损失,以捕捉不同层次的属性,以保证视觉上吸引人和物理上精确的3D几何。 大量实验表明,我们的方法不仅可以定性地生成具有更好细节的网格模型,而且与现有技术相比,还可以实现更高的3D形状估计精度。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning 

 
 

【这篇主要处理的是丢失目标,目标不完整的情况。但是他的参考做法可以学习,主要就是处理FFD

 
 

https://github.com/ranahanocka/ALIGNet

 
 

The process of aligning a pair of shapes is a fundamental operation in computer graphics. Traditional approaches rely heavily on matching corresponding points or features to guide the alignment, a paradigm that falters when significant shape portions are missing. These techniques generally do not incorporate prior knowledge about expected shape characteristics, which can help compensate for any misleading cues left by inaccuracies exhibited in the input shapes. We present an approach based on a deep neural network, leveraging shape datasets to learn a shape-aware prior for source-to-target alignment that is robust to shape incompleteness. In the absence of ground truth alignments for supervision, we train a network on the task of shape alignment using incomplete shapes generated from full shapes for self-supervision. Our network, called ALIGNet, is trained to warp complete source shapes to incomplete targets, as if the target shapes were complete, thus essentially rendering the alignment partial-shape agnostic. We aim for the network to develop specialized expertise over the common characteristics of the shapes in each dataset, thereby achieving a higher-level understanding of the expected shape space to which a local approach would be oblivious. We constrain ALIGNet through an anisotropic total variation identity regularization to promote piecewise smooth deformation fields, facilitating both partial-shape agnosticism and post-deformation applications. We demonstrate that ALIGNet learns to align geometrically distinct shapes, and is able to infer plausible mappings even when the target shape is significantly incomplete. We show that our network learns the common expected characteristics of shape collections, without over-fitting or memorization, enabling it to produce plausible deformations on unseen data during test time.【对齐一对形状的过程是计算机图形学中的基本操作。传统方法在很大程度上依赖于匹配相应的点或特征以指导对齐,这是在缺少重要形状部分时摇摆不定的范例。这些技术通常不包含关于预期形状特征的先验知识,这可以帮助补偿由输入形状中显示的不准确性留下的任何误导性线索。我们提出了一种基于深度神经网络的方法,利用形状数据集来学习形状感知先验,用于源到目标的对齐,这对于形状不完整性是稳健的。在没有用于监督的地面实况对齐的情况下,我们使用从完整形状生成的不完整形状来训练形状对齐任务的网络以用于自我监督。我们的网络称为ALIGNet,经过训练可以将完整的源形状扭曲到不完整的目标,就像目标形状完整一样,从而基本上使对齐部分形状不可知。我们的目标是让网络开发出关于每个数据集中形状的共同特征的专业知识,从而更好地理解本地方法无法忽视的预期形状空间。我们通过各向异性的全变差恒等式约束来约束ALIGNet,以促进分段平滑变形场,促进部分形状不可知性和后变形应用。我们证明了ALIGNet学会了对齐几何形状不同的形状,并且即使目标形状非常不完整也能够推断出合理的映射。我们展示了我们的网络学习了形状集合的共同预期特征,没有过度拟合或记忆,使其能够在测试时间内对看不见的数据产生合理的变形。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Learning Detail Transfer based on Geometric Features

 
 

【这篇的主要贡献,我们可以参考的是纹理的处理,怎么将局部特征对应到新模型的目标位置。问题是没代码且是李浩的文章,能不能做到是问号。】

 
 

http://surfacedetails.cs.princeton.edu/

 
 

The visual richness of computer graphics applications is frequently limited by the difficulty of obtaining high-quality, detailed 3D models. This paper proposes a method for realistically transferring details (specifically, displacement maps) from existing high-quality 3D models to simple shapes that may be created with easy-to-learn modeling tools. Our key insight is to use metric learning to find a combination of geometric features that successfully predicts detail-map similarities on the source mesh, and use the learned feature combination to drive the detail transfer. The latter uses a variant of multi-resolution non-parametric texture synthesis, augmented by a high-frequency detail transfer step in texture space. We demonstrate that our technique can successfully transfer details among a variety of shapes including furniture and clothing.【计算机图形应用程序的视觉丰富性经常受到获得高质量,详细3D模型的困难的限制。 本文提出了一种方法,用于将现有高质量3D模型的细节(特别是位移图)实际转移到可以使用易于学习的建模工具创建的简单形状。 我们的主要观点是使用度量学习来找到几何特征的组合,这些几何特征成功地预测源网格上的细节图相似性,并使用学习的特征组合来驱动细节传递。 后者使用多分辨率非参数纹理合成的变体,通过纹理空间中的高频细节传递步骤来增强。 我们证明我们的技术可以成功地在各种形状之间传递细节,包括家具和服装。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Appearance Modeling via Proxy-to-Image Alignment

 
 

【需要用户辅助的图像建模表面细节提取,和我们的工作没啥关系】

 
 

http://vcc.tech/research/2018/AppMod

 
 

Endowing 3D objects with realistic surface appearance is a challenging and time-demanding task, since real world surfaces typically exhibit a plethora of spatially variant geometric and photometric detail. Not surprisingly, computer artists commonly use images of real world objects as an inspiration and a reference for their digital creations. However, despite two decades of research on image-based modeling, there are still no tools available for automatically extracting the detailed appearance (micro-geometry and texture) of a 3D surface from a single image. In this paper, we present a novel user-assisted approach for quickly and easily extracting a non-parametric appearance model from a single photograph of a reference object.【赋予具有逼真表面外观的3D对象是具有挑战性和时间要求的任务,因为真实世界表面通常表现出过多的空间变化几何和光度细节。 毫不奇怪,计算机艺术家通常使用现实世界物体的图像作为其数字创作的灵感和参考。 然而,尽管对基于图像的建模进行了二十年的研究,但仍然没有可用于从单个图像自动提取3D表面的详细外观(微观几何和纹理)的工具。 在本文中,我们提出了一种新颖的用户辅助方法,用于从参考对象的单张照片中快速轻松地提取非参数外观模型。】

 
 

The extraction process requires a user-provided proxy, whose geometry roughly approximates that of the object in the image. Since the proxy is just a rough approximation, it is necessary to align and deform it so as to match the reference object. The main contribution of this paper is a novel technique to perform such an alignment, which enables accurate joint recovery of geometric detail and reflectance. The correlations between the recovered geometry at various scales and the spatially varying reflectance constitute a non-parametric appearance model. Once extracted, the appearance model may then be applied to various 3D shapes, whose large scale geometry may differ considerably from that of the original reference object. Thus, our approach makes it possible to construct an appearance library, allowing users to easily enrich detail-less 3D shapes with realistic geometric detail and surface texture.【提取过程需要用户提供的代理,其几何图形大致接近图像中对象的几何图形。 由于代理只是粗略的近似,因此必须对齐并对其进行变形以匹配参考对象。 本文的主要贡献是实现这种对准的新技术,其能够精确地实现几何细节和反射的联合恢复。 各种尺度下恢复的几何形状与空间变化的反射率之间的相关性构成非参数外观模型。 一旦被提取,外观模型然后可以应用于各种3D形状,其大尺寸几何形状可以与原始参考对象的大小不同。 因此,我们的方法可以构建一个外观库,使用户可以轻松地利用逼真的几何细节和表面纹理来丰富无细节的3D形状。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • MeshCNN: A Network with an Edge (TOG 2019)

 
 

【提出了直接作用于三角形网格的CNN,主要是改造了池化层。一个主要的应用就是网格保型简化,这个和我们人脸的简化很类似,值得看看学习,就是代码也还没上线。】

 
 

https://arxiv.org/pdf/1809.05910v2.pdf

https://imvc.co.il/Portals/117/Rana%20Hanocka.pdf

 
 

Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This nonuniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of our task-driven pooling on various learning tasks applied to 3D meshes. 【多边形网格为3D形状提供了有效的表示。它们明确地捕获形状表面和拓扑,并利用非均匀性来表示大的平坦区域以及尖锐,复杂的特征。然而,这种不均匀性和不规则性使用结合了卷积和汇集操作的神经网络来抑制网格分析工作。在本文中,我们利用网格的独特属性,使用MeshCNN(一种专为三角网格设计的卷积神经网络)直接分析三维形状。类似于传统的CNNMeshCNN通过利用其固有的测地连接,结合了在网格边缘上运行的专用卷积和池化层。卷积应用于边缘和它们的入射三角形的四个边缘,并且通过保留表面拓扑的边缘折叠操作来应用汇集,从而为后续卷积生成新的网格连通性。 MeshCNN学习哪些边缘崩溃,从而形成一个任务驱动的过程,网络暴露并扩展重要特征,同时丢弃冗余特征。我们展示了我们的任务驱动池对应用于3D网格的各种学习任务的有效性。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

 
 

【第一种直接在学习模型里面采用mesh数据的paper,但这边的任务重点是重建,也就是网格从无到有,不是我们所需要的网格变形。】

 
 

https://github.com/EdwardSmith1884/GEOMetrics

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Kato, H., Ushiku, Y., and Harada, T. Neural 3d mesh renderer. arXiv preprint arXiv:1711.07566, 2017.

     
     

【日本人的paper,其妙的英文表达,看不懂啊!】

 
 

https://arxiv.org/pdf/1711.07566.pdf

https://github.com/hiroharu-kato/neural_renderer

For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer【为了对2D图像背后的3D世界进行建模,哪种3D表示最合适?多边形网格因其紧凑性和几何特性而备受青睐。然而,使用神经网络从2D图像建模多边形网格并不简单,因为从网格到图像或渲染的转换涉及称为光栅化的离散操作,这会阻止反向传播。因此,在这项工作中,我们提出了一种用于光栅化的近似梯度,它可以将渲染集成到神经网络中。使用此渲染器,我们使用轮廓图像监控执行单图像3D网格重建,并且我们的系统优于现有的基于体素的方法。此外,我们首次使用2D监控执行基于渐变的3D网格编辑操作,例如2D3D样式传输和3D DeepDream。这些应用程序展示了将网格渲染器集成到神经网络中的潜力以及我们提出的渲染器的有效性】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Groueix, T., Fisher, M., Kim, V., Russell, B., and Aubry, M. Atlasnet: A papier-mach ˆ e approach to learning 3d ´ surface generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018a.

     
     

【一种新的直接输出网格模型的方法,其原理是将一组正方形直接匹配成模型表面】

 
 

http://imagine.enpc.fr/~groueixt/atlasnet/

https://github.com/ThibaultGROUEIX/AtlasNet

 
 

 
 

 
 

We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) autoencoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation. 【我们介绍了一种学习生成3D形状表面的方法。 我们的方法将3D形状表示为参数化表面元素的集合,并且与生成体素网格或点云的方法相比,自然地推断出形状的表面表示。 除了新颖之外,我们新的形状生成框架AtlasNet具有显着的优势,例如提高的精度和泛化能力,以及在没有内存问题的情况下生成任意分辨率形状的可能性。 我们展示了这些优势,并与ShapeNet基准测试中针对两种应用的强基线进行了比较:(i)自动编码形状,以及(ii)静止图像的单视图重建。 我们还提供显示其其他应用程序潜力的结果,例如变形,参数化,超分辨率,匹配和共分割。】

 
 

 
 

 
 

 
 

 
 

 
 

  • Pontes, J. K., Kong, C., Sridharan, S., Lucey, S., Eriksson, A., and Fookes, C. Image2mesh: A learning framework for single image 3d reconstruction. arXiv preprint arXiv:1711.10669, 2017.

 
 

【这里也是采用FFD来实现网格变形的】

 
 

https://jhonykaesemodel.com/publication/image2mesh/

 
 

 
 

A challenge that remains open in 3D deep learning is how to efficiently represent 3D data to feed deep neural networks. Recent works have been relying on volumetric or point cloud representations, but such approaches suffer from a number of issues such as computational complexity, unordered data, and lack of finer geometry. An efficient way to represent a 3D shape is through a polygon mesh as it encodes both shape’s geometric and topological information. However, the mesh’s data structure is an irregular graph (i.e. collection of vertices connected by edges to form polygonal faces) and it is not straightforward to integrate it into learning frameworks since every mesh is likely to have a different structure. Here we address this drawback by efficiently converting an unstructured 3D mesh into a regular and compact shape parametrization that is ready for machine learning applications. We developed a simple and lightweight learning framework able to reconstruct high-quality 3D meshes from a single image by using a compact representation that encodes a mesh using free-form deformation and sparse linear combination in a small dictionary of 3D models. In contrast to prior work, we do not rely on classical silhouette and landmark registration techniques to perform the 3D reconstruction. We extensively evaluated our method on synthetic and real-world datasets and found that it can efficiently and compactly reconstruct 3D objects while preserving its important geometrical aspects.3D深度学习中仍然存在的挑战是如何有效地表示3D数据以供给深度神经网络。最近的工作一直依赖于体积或点云表示,但是这些方法存在许多问题,例如计算复杂性,无序数据和缺乏更精细的几何。表示3D形状的有效方式是通过多边形网格,因为它对形状的几何和拓扑信息进行编码。然而,网格的数据结构是不规则图形(即由边缘连接以形成多边形面的顶点的集合),并且将其集成到学习框架中并不是直接的,因为每个网格可能具有不同的结构。在这里,我们通过有效地将非结构化3D网格转换为可用于机器学习应用的常规和紧凑形状参数化来解决这一缺点。我们开发了一个简单轻量的学习框架,能够通过使用紧凑的表示从单个图像重建高质量的3D网格,该表示使用自由形式变形和三维模型的小字典中的稀疏线性组合来编码网格。与先前的工作相比,我们不依赖于经典轮廓和地标配准技术来执行3D重建。我们在合成和真实世界数据集上广泛评估了我们的方法,发现它可以有效和紧凑地重建3D对象,同时保留其重要的几何方面。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Kanazawa, A., Tulsiani, S., Efros, A. A., and Malik, J. Learning category-specific mesh reconstruction from image collections. arXiv preprint arXiv:1803.07549, 2018.

 
 

【这篇的实现还是非常值得参考的,因为其要解决的问题可以适用于人脸,主要就随推测图片中的模型,光照,纹理,而且也是采用deform来实现的,其引用的一些paper也值得参考。】

 
 

https://akanazawa.github.io/cmr/

https://eccv2018.org/openaccess/content_ECCV_2018/papers/Angjoo_Kanazawa_Learning_Category-Specific_Mesh_ECCV_2018_paper.pdf

 
 

 
 

We present a learning framework for recovering the 3D shape, camera, and texture of an object from a single image. The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation. Our approach allows leveraging an annotated image collection for training, where the deformable model and the 3D prediction mechanism are learned without relying on ground-truth 3D or multi-view supervision. Our representation enables us to go beyond existing 3D prediction approaches by incorporating texture inference as prediction of an image in a canonical appearance space. Additionally, we show that semantic keypoints can be easily associated with the predicted shapes. We present qualitative and quantitative results of our approach on CUB and PASCAL3D datasets and show that we can learn to predict diverse shapes and textures across objects using only annotated image collections. The project website can be found at https://akanazawa.github.io/cmr/. 【我们提供了一个学习框架,用于从单个图像中恢复对象的3D形状,相机和纹理。该形状表示为对象类别的可变形3D网格模型,其中形状通过学习的平均形状和每实例预测的变形来参数化。我们的方法允许利用带注释的图像集进行训练,其中学习可变形模型和3D预测机制,而不依赖于地面实况3D或多视图监督。我们的表示使我们能够通过将纹理推断作为规范外观空间中的图像预测来超越现有的3D预测方法。此外,我们表明语义关键点可以很容易地与预测的形状相关联。我们提供了我们对CUBPASCAL3D数据集的方法的定性和定量结果,并表明我们可以学习使用仅带注释的图像集合来预测对象的不同形状和纹理。项目网站可以在 https://akanazawa.github.io/cmr/ 找到。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Henderson, P. and Ferrari, V. Learning to generate and reconstruct 3d meshes with only 2d supervision. arXiv preprint arXiv:1807.09259, 2018.

 
 

【也是一种直接输出模型的训练方法,这边的代码是tensorflow的渲染框架,而不是完整的项目代码】

 
 

https://arxiv.org/pdf/1807.09259.pdf

https://github.com/pmh47/dirt

 
 

 
 

We present a unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples. These tasks have received considerable attention recently; however, existing approaches rely on 3D supervision, annotation of 2D images with keypoints or poses, and/or training with multiple views of each object instance. Our framework is very general: it can be trained in similar settings to these existing approaches, while also supporting weaker supervision scenarios. Importantly, it can be trained purely from 2D images, without ground-truth pose annotations, and with a single view per instance. We employ meshes as an output representation, instead of voxels used in most prior work. This allows us to exploit shading information during training, which previous 2D-supervised methods cannot. Thus, our method can learn to generate and reconstruct concave object classes. We evaluate our approach on synthetic data in various settings, showing that (i) it learns to disentangle shape from pose; (ii) using shading in the loss improves performance; (iii) our model is comparable or superior to state-of-the-art voxel-based approaches on quantitative metrics, while producing results that are visually more pleasing; (iv) it still performs well when given supervision weaker than in prior works.

【我们提出了一个统一的框架来解决两个问题:从单个图像中进行特定类别的3D重建,以及生成新的3D形状样本。这些任务最近受到了相当多的关注;然而,现有方法依赖于3D监督,具有关键点或姿势的2D图像的注释,和/或具有每个对象实例的多个视图的训练。我们的框架非常通用:它可以在与现有方法类似的环境中进行培训,同时也支持较弱的监督方案。重要的是,它可以纯粹从2D图像进行训练,没有地面真实姿势注释,并且每个实例只有一个视图。我们使用网格作为输出表示,而不是大多数先前工作中使用的体素。这允许我们在训练期间利用着色信息,而以前的2D监督方法不能。因此,我们的方法可以学习生成和重建凹对象类。我们在各种环境中评估我们对合成数据的处理方法,表明(i)它学会从形状中解开形状; ii)在损失中使用阴影改善表现; iii)我们的模型与量化指标的最先进的基于体素的方法相当或更优,同时产生视觉上更令人愉悦的结果; iv)如果监管比以前的工程弱,它仍然表现良好。】

 
 

 
 

 
 

 
 

 
 

 
 

  • Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., and Aubry, M. 3d-coded: 3d correspondences by deep deformation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 230–246, 2018b.

 
 

【这个主要在处理网格变形,貌似和我们的人脸处理的问题差距还是蛮大的】

 
 

https://github.com/ThibaultGROUEIX/3D-CODED

 
 

We present a new deep learning approach for matching deformable shapes by introducing Shape Deformation Networks which jointly encode 3D shapes and correspondences. This is achieved by factoring the surface representation into (i) a template, that parameterizes the surface, and (ii) a learnt global feature vector that parameterizes the transformation of the template into the input surface. By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template. We show that these correspondences can be improved by an additional step which improves the shape feature by minimizing the Chamfer distance between the input and transformed template. We demonstrate that our simple approach improves on stateof-the-art results on the difficult FAUST-inter challenge, with an average correspondence error of 2.88cm. We show, on the TOSCA dataset, that our method is robust to many types of perturbations, and generalizes to non-human shapes. This robustness allows it to perform well on real unclean, meshes from the the SCAPE dataset.

【我们提出了一种新的深度学习方法,通过引入形状变形网络来匹配可变形状,该网络共同编码3D形状和对应关系。这是通过将表面表示分解为(i)参数化表面的模板,以及(ii)学习的全局特征向量来实现的,该向量化参数化模板到输入表面的变换。通过为新形状预测此特征,我们隐式地预测此形状与模板之间的对应关系。我们证明了这些对应关系可以通过一个额外的步骤来改进,该步骤通过最小化输入和转换模板之间的倒角距离来改善形状特征。我们证明了我们的简单方法改进了最难的FAUST-inter挑战的最新结果,平均对应误差为2.88cm。我们在TOSCA数据集上显示,我们的方法对于许多类型的扰动是稳健的,并且推广到非人类形状。这种鲁棒性使其能够在SCAPE数据集的真实不洁净网格上表现良好。】

 
 

 
 

 
 

 
 

 
 

 
 

  • Kar, A., Tulsiani, S., Carreira, J., and Malik, J. Categoryspecific object reconstruction from a single image. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1966–1974, 2015.

 
 

【这边的关键是可变形模型,但是这篇时间较早,限制较多,可以看看其方法是怎么做的】

 
 

https://abhishekkar.info/categoryshapes.pdf

https://github.com/akar43/CategoryShapes

 
 

Object reconstruction from a single image – in the wild – is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC. 【单个图像的对象重建在野外是我们今天可以取得进展并获得有意义结果的问题。 这是本文的主要信息,它引入了一个自动流水线,其中像素作为输入,各种刚性类别的3D表面作为逼真场景图像的输出。 我们的方法的核心是可变形3D模型,可以从现有物体检测数据集中可用的2D注释中学习,可以通过噪声自动对象分割来驱动,并且我们通过自下而上模块补充以恢复高频形状细节。 我们使用最近推出的PASCAL 3D +数据集对我们的方法进行全面的定量分析和消融研究,并显示非常令人鼓舞的PASCAL VOC自动重建。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation

 
 

【这里解决了关键点对应的问题,但是其follow的主要paper类型都不是我们这里的思路,因此参考意义不大。】

 
 

http://openaccess.thecvf.com/content_cvpr_2017/papers/Yi_SyncSpecCNN_Synchronized_Spectral_CVPR_2017_paper.pdf

https://github.com/ericyi/SyncSpecCNN

https://cs.stanford.edu/~ericyi/

 
 

In this paper, we study the problem of semantic annotation on 3D models that are represented as shape graphs. A functional view is taken to represent localized information on graphs, so that annotations such as part segment or keypoint are nothing but 0-1 indicator vertex functions. Compared with images that are 2D grids, shape graphs are irregular and non-isomorphic data structures. To enable the prediction of vertex functions on them by convolutional neural networks, we resort to spectral CNN method that enables weight sharing by parametrizing kernels in the spectral domain spanned by graph Laplacian eigenbases. Under this setting, our network, named SyncSpecCNN, strives to overcome two key challenges: how to share coefficients and conduct multi-scale analysis in different parts of the graph for a single shape, and how to share information across related but different shapes that may be represented by very different graphs. Towards these goals, we introduce a spectral parametrization of dilated convolutional kernels and a spectral transformer network. Experimentally we tested SyncSpecCNN on various tasks, including 3D shape part segmentation and keypoint prediction. State-of-the-art performance has been achieved on all benchmark datasets.

【在本文中,我们研究了表示为形状图的3D模型的语义标注问题。功能视图用于表示图形上的本地化信息,因此诸如零件段或关键点之类的注释只是0-1指示器顶点函数。与2D网格图像相比,形状图是不规则的和非同构的数据结构。为了能够通过卷积神经网络预测它们上的顶点函数,我们采用光谱CNN方法,通过参数化由图拉普拉斯算子特征基跨越的谱域中的核来实现权重共享。在此设置下,我们的网络名为SyncSpecCNN,努力克服两个关键挑战:如何共享系数并在图形的不同部分进行单一形状的多尺度分析,以及如何跨相关但不同的形状共享信息由不同的图表代表。为实现这些目标,我们引入了扩张卷积核和谱变换器网络的谱参数化。实验上,我们在各种任务上测试了SyncSpecCNN,包括3D形状部分分割和关键点预测。所有基准数据集都实现了最先进的性能。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Supervised Fitting of Geometric Primitives to 3D Point Clouds

 
 

【这边采用的是一组点云来拟合模型的方式,但是这边处理的问题是拟合,不会发生模型变形,因此处理的问题还是比较不一样的。我们可以看看这篇的做法的可取之处,这篇有代码。】

 
 

https://arxiv.org/pdf/1811.08988.pdf

https://github.com/csimstu2/SPFN

 
 

 
 

 
 

Fitting geometric primitives to 3D point cloud data bridges a gap between low-level digitized 3D data and highlevel structural information on the underlying 3D shapes. As such, it enables many downstream applications in 3D data processing. For a long time, RANSAC-based methods have been the gold standard for such primitive fitting problems, but they require careful per-input parameter tuning and thus do not scale well for large datasets with diverse shapes. In this work, we introduce Supervised Primitive Fitting Network (SPFN), an end-to-end neural network that can robustly detect a varying number of primitives at different scales without any user control. The network is supervised using ground truth primitive surfaces and primitive membership for the input points. Instead of directly predicting the primitives, our architecture first predicts per-point properties and then uses a differential model estimation module to compute the primitive type and parameters. We evaluate our approach on a novel benchmark of ANSI 3D mechanical component models and demonstrate a significant improvement over both the state-of-the-art RANSACbased methods and the direct neural prediction【将几何图元拟合到3D点云数据桥接了低级数字化3D数据与基础3D形状上的高级结构信息之间的差距。因此,它使许多下游应用程序能够进行3D数据处理。长期以来,基于RANSAC的方法一直是这种原始拟合问题的黄金标准,但它们需要仔细的每输入参数调整,因此对于具有不同形状的大型数据集不能很好地扩展。在这项工作中,我们引入了监督原始拟合网络(SPFN),这是一种端到端神经网络,可以在不需要任何用户控制的情况下,以不同的比例稳健地检测不同数量的基元。使用地面实况原始表面和输入点的原始成员资格来监视网络。我们的架构不是直接预测基元,而是首先预测每点属性,然后使用差分模型估计模块来计算基元类型和参数。我们在ANSI 3D机械组件模型的新基准上评估我们的方法,并证明了对最先进的基于RANSAC的方法和直接神经预测的显着改进】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • Laine, S., Karras, T., Aila, T., Herva, A., Saito, S., Yu, R., Li, H., Lehtinen, J.: Productionlevel facial performance capture using deep convolutional neural networks. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2017)

 
 

【NVIDIA 给出的3D面部捕捉方案】

 
 

https://research.nvidia.com/sites/default/files/publications/laine2017sca_paper_0.pdf

https://github.com/xianyuMeng/FacialCapture

 
 

We present a real-time deep learning framework for video-based facial performance capture—the dense 3D tracking of an actor’s face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5–10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.

【我们提供了一个基于视频的面部表现捕捉的实时深度学习框架在单眼视频下对演员面部进行密集的3D跟踪。我们的管道开始于使用基于多视图立体跟踪和艺术家增强动画的高端制作面部捕捉管道准确捕捉主题。通过5-10分钟捕获的镜头,我们训练卷积神经网络,从该主题的单眼视频序列产生高质量的输出,包括自遮挡区域。由于这种3D面部表情捕捉是完全自动化的,我们的系统可以大大减少现代叙事驱动的视频游戏或电影的发展所涉及的劳动量,涉及逼真的演员数字双打和每个角色可能数小时的动画对话。我们将我们的结果与几种最先进的单眼实时面部捕捉技术进行比较,并在眼睛和嘴唇等具有挑战性的领域展示引人注目的动画推理。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • SurfNet: Generating 3D shape surfaces using deep residual networks

 
 

【提出了一种直接处理三角形网格的深度学习方案,其处理的对象是三维模型表面的点云,有代码】

 
 

http://openaccess.thecvf.com/content_cvpr_2017/papers/Sinha_SurfNet_Generating_3D_CVPR_2017_paper.pdf

 
 

3D shape models are naturally parameterized using vertices and faces, i.e., composed of polygons forming a surface. However, current 3D learning paradigms for predictive and generative tasks using convolutional neural networks focus on a voxelized representation of the object. Lifting convolution operators from the traditional 2D to 3D results in high computational overhead with little additional benefit as most of the geometry information is contained on the surface boundary. Here we study the problem of directly generating the 3D shape surface of rigid and non-rigid shapes using deep convolutional neural networks. We develop a procedure to create consistent ‘geometry images’ representing the shape surface of a category of 3D objects. We then use this consistent representation for category-specific shape surface generation from a parametric representation or an image by developing novel extensions of deep residual networks for the task of geometry image generation. Our experiments indicate that our network learns a meaningful representation of shape surfaces allowing it to interpolate between shape orientations and poses, invent new shape surfaces and reconstruct 3D shape surfaces from previously unseen images. Our code is available at https://github.com/sinhayan/surfnet.

3D形状模型自然地使用顶点和面参数化,即由形成表面的多边形组成。然而,当前使用卷积神经网络的预测和生成任务的3D学习范例集中于对象的体素化表示。将卷积运算符从传统的2D提升到3D导致高计算开销,几乎没有额外的好处,因为大多数几何信息包含在表面边界上。在这里,我们研究使用深度卷积神经网络直接生成刚性和非刚性形状的3D形状表面的问题。我们开发了一个程序来创建表示一类3D对象的形状表面的一致几何图像。然后,我们通过为几何图像生成任务开发深度残差网络的新扩展,从参数表示或图像中使用这种一致表示来生成类别特定的形状表面。我们的实验表明,我们的网络学习了有意义的形状表面表示,允许它在形状方向和姿势之间进行插值,发明新的形状表面,并从以前看不见的图像重建3D形状表面。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • 3d object dense reconstruction from a single depth view

 
 

https://pdfs.semanticscholar.org/ad8b/98eadac9520dcf7909b71bb65586a4993784.pdf

https://github.com/Yang7879/3D-RecGAN-extended

 
 

In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 2563 by recovering the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets and real-world Kinect datasets show that the proposed 3D-RecGAN++ significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects. 【在本文中,我们提出了一种新颖的方法3D-RecGAN ++,它使用生成对抗网络从单个任意深度视图重建给定对象的完整3D结构。与通常需要相同对象或类标签的多个视图以恢复完整3D几何的现有工作不同,所提出的3D-RecGAN ++仅将对象的深度视图的体素网格表示作为输入,并且能够生成完整的通过恢复被遮挡/缺失的区域,具有2563的高分辨率的3D占用网格。关键思想是结合自动编码器的生成功能和条件生成对抗网络(GAN)框架,以推断高维体素空间中对象的精确和细粒度3D结构。对大型合成数据集和真实世界Kinect数据集的大量实验表明,所提出的3D-RecGAN ++在单视图3D对象重建中明显优于现有技术,并且能够重建看不见的对象类型。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

  • End-to-end Recovery of Human Shape and Pose

 
 

http://openaccess.thecvf.com/content_cvpr_2018/papers/Kanazawa_End-to-End_Recovery_of_CVPR_2018_paper.pdf

https://github.com/akanazawa/hmr

 
 

We describe Human Mesh Recovery (HMR), an end-toend framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allows our model to be trained using in-the-wild images that only have ground truth 2D annotations. However, the reprojection loss alone is highly under constrained. In this work we address this problem by introducing an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation. 【我们描述了人类网格恢复(HMR),一种用于从单个RGB图像重建人体的完整3D网格的端到端框架。与计算2D3D关节位置的大多数当前方法相比,我们生成更丰富且更有用的网格表示,其通过形状和3D关节角度参数化。主要目标是最小化关键点的重投影损失,这允许我们的模型使用仅具有地面实况2D注释的野外图像进行训练。然而,单独的重投影损失严重受限制。在这项工作中,我们通过引入一个受过训练的对手来解决这个问题,该对手使用3D人体网格的大型数据库来判断人体形状和姿势参数是否真实。我们表明HMR可以使用和不使用任何配对的2D-3D监督进行训练。我们不依赖于中间2D关键点检测并直接从图像像素推断3D姿势和形状参数。我们的模型在包含该人的边界框的情况下实时运行。我们演示了我们对各种图像的方法,这些方法是在野外和超出以前的基于优化的方法,这些方法输出3D网格并在3D关节位置估计和零件分割等任务上显示竞争结果。】

 
 

 
 

 
 

 
 

 
 

 
 

  • Cartoonish sketch-based face editing in videos using identity deformation transfer

 
 

【这里面有我们的刚需:变形传递相关的paper,这边是需要我们好好看看的】

 
 

https://arxiv.org/pdf/1703.08738

 
 

 
 

We address the problem of using hand-drawn sketches to create exaggerated deformations to faces in videos, such as enlarging the shape or modifying the position of eyes or mouth. This task is formulated as a 3D face model reconstruction and deformation problem. We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame. At the same time, user’s editing intention is recognized from input sketches as a set of facial modifications. Then a novel identity deformation algorithm is proposed to transfer these facial deformations from 2D space to the 3D facial identity directly while preserving the facial expressions. After an optional stage for further refining the 3D face model, these changes are propagated to the whole video with the modified identity. Both the user study and experimental results demonstrate that our sketching framework can help users effectively edit facial identities in videos, while high consistency and fidelity are ensured at the same time. 【我们解决了使用手绘草图为视频中的面部创建夸张变形的问题,例如放大形状或修改眼睛或嘴巴的位置。该任务被公式化为3D人脸模型重建和变形问题。我们首先通过为每个帧配置面部可变形模型来恢复视频中的面部身份和表情。同时,用户的编辑意图从输入草图中识别为一组面部修改。然后提出了一种新的身份变形算法,在保留面部表情的同时,将这些面部变形从2D空间直接传递到3D面部身份。在进一步修改3D人脸模型的可选阶段之后,这些更改将以修改后的身份传播到整个视频。用户研究和实验结果都表明,我们的草图框架可以帮助用户有效地编辑视频中的面部身份,同时确保高一致性和完美性。】

 
 

Deformation transfer [15, 16] firstly addressed the problem of transferring local deformations between two different meshes, where the deformation gradient of meshes is directly transferred by solving an optimization problem. Semantic deformation transfer [17] inferred a correspondence between the shape spaces of the two characters from given example mesh pairs by using standard linear algebra. Zhou et al. [18] further utilized these methods to automatically generate a 3D cartoon of a real 3D face. Thies et al. [2] developed a system that transfers expression changes from the source to the target actor based on [15] and achieves real-time performance. Xu et al. [19] designed a facial expression transfer and editing technique for high-fidelity facial performance data. Moreover, other flow-based approaches [3, 4] are also proposed to transfer facial expression to different face meshes. However, these traditional methods aim to transfer deformations, especially facial expressions, between 3D meshes. Differing from them, we propose a transfer pipeline which can be used to directly transfer local identity changes in 2D space to a 3D face model. Huang et al. [20] presented an approach to project changes of a mesh in 2D to 3D as the projection constraint. Compared with it, the main novelty of our algorithm is that we combine a sketchbased interface to enable users to perform the editing with handdrawn sketches from 2D to 3D. We first map sketch into a set of modifications corresponding to 3D space, and then transfer it to the target 3D mesh.

【变形传递[15,16]首先解决了在两个不同网格之间传递局部变形的问题,其中网格的变形梯度通过求解优化问题直接传递。语义变形传递[17]通过使用标准线性代数推断出来自给定的示例网格对的两个字符的形状空间之间的对应关系。周等人 [18]进一步利用这些方法自动生成真实3D脸部的3D卡通。 Thies等人 [2]开发了一个系统,根据[15]将表达式变化从源转移到目标行为者,并实现实时性能。徐等人 [19]设计了面部表情转移和编辑技术,用于高保真面部表现数据。此外,还提出了其他基于流的方法[3,4],以将面部表情转移到不同的面部网格。然而,这些传统方法旨在在3D网格之间传递变形,尤其是面部表情。与它们不同,我们提出了一种传输管道,可用于将2D空间中的局部身份变化直接传递到3D人脸模型。黄等人 [20]提出了一种方法,将2D3D网格的项目变更作为投影约束。与之相比,我们算法的主要新颖之处在于我们结合了基于草图的界面,使用户能够使用从2D3D的一手绘草图进行编辑。我们首先将草图映射到对应于3D空间的一组修改,然后将其传递到目标3D网格。】

 
 

补充:变形传递,这个真的是我们的刚需

[15] Sumner, RW, Popovic, J. Deformation transfer for triangle meshes. ´ ACM Transactions on Graphics (TOG) 2004;23(3):399–405.

[16] Botsch, M, Sumner, R, Pauly, M, Gross, M. Deformation transfer for detail-preserving surface editing. In: Vision, Modeling & Visualization. 2006, p. 357–364.

[18] Zhou, J, Tong, X, Liu, Z, Guo, B. 3D cartoon face generation by local deformation mapping. The Visual Computer 2016;32(6):717–727.

[19] Xu, F, Chai, J, Liu, Y, Tong, X. Controllable high-fidelity facial performance transfer. ACM Transactions on Graphics (TOG) 2014;33(4):42:1– 42:11.

 
 

补充:引用这篇文章的paper