An Implicit Parametric Morphable Dental Model

type

status

date

slug

summary

这篇文章最后实现的功能有

将牙齿模型语义分割并重建，

重建后的模型可以形成一个可编辑或替换的牙齿模型

在两个重建好的模型中插值（比如初始矫正状态和最后矫正状态中插如相关步骤，但可能缺乏医生需要的关键步骤）

我所认为的本篇文章实现的功能借鉴的思想

（很遗憾时间不够没有仔细看）

解决插值&语义标注

场景几何建模是计算机视觉和计算机图形学中的一项基本任务。以前的方法使用显式表示，例如网格体素或点云 [Keller et al 2013]。虽然这种显式表示在许多应用中都取得了成功,但它们受到许多限制：体素是内存密集型的，网格很难处理详细的结构，点云很稀疏并丢失了很大一部分几何图形。因此，在过去的几年里，人们做出了一些努力来探索所谓的隐式几何表示。与早期的方法不同，隐式表示间接对几何进行编码，例如作为分类器的决策边界来决定一个点是位于被检查对象的内部还是外部。和Occupancy Networks。 DeepSDF 使用带符号的距离函数来测量一个点距离被检查物体表面的距离，而 Occupancy Networks 预测一个点位于物体内部的概率。这两种方法都展示了有趣的功能，例如在auto-encoder 或 auto-decoder 架构中学习的latent code之间进行插值。后续工作解决了隐式表示的局限性。一个主要限制是缺乏对应关系，这限制了编辑和模型学习能力。（本篇应该是借鉴了这些工作来解决语义标注）一些工作通过习得一个模板来解决这个问题。这篇文章好像也学习出来了一个模板

正是通过这个模板形状，我们的方法不仅能够重建给定的牙科扫描，还能对其进行语义标记，识别其中的每颗牙齿（图 4）。我们的数据集的构建仍然需要手动完成此标记，但我们的方法现在提供了一种自动化此任务的方法。

点此展开英文原文

Modeling scene geometry is a fundamental task in both computer vision and computer graphics. Previous methods use explicit representations such as meshes [Thies et al 2016], voxels [Nießner et al 2013] or points clouds [Keller et al 2013]. While such explicit representations have been successful in many applications [Zollhöfer et al 2018a], they suffer from a number of limitations: Voxels are memory-intensive, meshes struggle to handle detailed structures and point clouds are sparse and lose a significant portion of the geometry. Thus, in the past few years several efforts were made to explore so-called implicit geometrical representation [Chen and Zhang 2019; Mescheder et al 2019; Park et al 2019]. Unlike the earlier approaches, implicit representations encode the geometry indirectly, for instance as the decision boundary of a classifier that decides whether a point lies inside or outside the examined object. Among the most popular implicit representations are DeepSDF [Park et al 2019] and Occupancy Networks [Mescheder et al 2019]. DeepSDF uses a signed distance function to measure how far a point is from the surface of the examined object, while Occupancy Networks predict the probability of a point lying inside the object. Both methods demonstrate interesting capabilities such as interpolating between latent codes learned either by an auto-decoder [Park et al 2019] or an auto-encoder [Mescheder et al 2019] architecture.

Follow-up works addressed limitations of implicit representations. One main limitation is the lack of correspondences, which limit editing and model learning capabilities. To this end, some works proposed to learn a template that is shared by all the training samples, which may be similar to the mean shape for the training samples. For instance, “Deep Implicit Templates”[Zheng et al 2021], or DIT for short, uses a network that learns the deformation to the template shape. “Deformed Implicit Field” [Deng et al 2021] , or DIF, proposes a similar idea, but, inspired by [Sitzmann et al 2019], they use so-called Hyper-Nets, that predict the weights of their deformation networks.

优化3D重建

后续例如 Yin 等人 [Yin 2020]提出了一种将同一对象的不同部分组合在一起的方法。这是通过学习各个部分的连接/关节来完成的。使用 Chen 等人 [Chen 和 Zhang 2019] 的隐式表示来学习关节。它们以一种与其余组件一致的方式学习，同时是平滑的和拓扑有效的。该解决方案使用从 ShapeNet 中提取的分段组件进行训练。 PQ-Net [Wu et al 2020] 以顺序零件装配方式表示和生成 3D 形状。但是，由于这种方法不能自动分割输入数据，因此需要在测试时对重建任务进行分割注释。他们的几何组件被严格组装成一个模型，而我们认为牙科模型需要将组件平滑地混合到一个重建中。（所以应该就是借鉴了这些来进一步优化了3d重建）

点击展开英文原文

For instance Yin et al [Yin et al 2020] proposed a method that combines different parts of the same object together. This is done by learning the connections/joints of the various parts. Joints are learned using the implicit representation of Chen et al [Chen and Zhang 2019]. They are learned in a way to agree with the remaining components while being smooth and topologically valid. The solution is trained with segmented components extracted from ShapeNet. PQ-Net [Wu et al 2020] represents and generates 3D shapes in a sequential part assembly manner. However, since this method is not able to automatically segment the input data, it requires segmentation annotations at test time for the reconstruction task. Their geometric components are rigidly assembled to compose a model, while we believe that dental models require smooth blending of components into one reconstruction.

模型构建以及插值

最近，人们越来越关注使用隐式表示来构建人体各个部位的模型。这包括人头模型 [Yenamandra et al 2021； Zheng et al 2022]，手 [Corona et al 2022] 和身体 [Alldieck et al 2021;邓等 2020； Palafox 等人 2021]。 Yenamandra 等人 [Yenamandra et al 2021] 的工作是这方面的首创。他们展示了人类头部的第一个 3D 可变形模型，包括头发。该模型学习身份、反照率、表情和发型的潜在代码。该模型名为 i3DMM，基于基于 SDF 的架构，该架构从具有不同发型和表现不同表情的各种对象的 3D 扫描中学习。该方法学习模板形状和对该形状的变形。因此，与早期的 3DMM 显式面部模型 [Egger et al 2020] 不同，它不需要复杂的非刚性扫描对齐，而只需要严格对齐的扫描。该方法在所有组件的潜在空间中展示了新颖的插值应用，例如身份、表情和发型。 ImFace [Zheng et al 2022] 是我们在此展示的一项并行工作。目的是使用局部 SDF 表示提高 i3DMM [Yenamandra et al 2021] 的重建精度。为此，整个人脸被分解为 5 个区域，具有单独的网络用于表达和身份学习。使用元学习方法，其中超网络学习表达和身份网络的权重。结果显示比 i3DMM 更准确的重建 [Yenamandra et al 2021]。 ImFace 和我们的工作之间的一个重要区别是，由于我们的目标是单独为每颗牙齿提供语义控制，我们为每个几何组件分配一个专用的潜在代码，即每颗牙齿和牙龈，而 ImFace 的潜在代码不能划分为几何的不同区域。此外，我们的方法与 NASA [Deng et al 2020] 不同：对于固定数量的关节，NASA 将几何体编码为姿势条件占用。这不适合牙科几何学，因为单颗牙齿可能会缺失，并且在训练集中为牙科扫描注释单个牙齿姿势和蒙皮权重将非常困难。

点击展开英文原文

Recently there has been increasing interest in building models of the various parts of the human body using implicit representations. This includes models for the human head [Yenamandra et al. 2021; Zheng et al. 2022], hands [Corona et al. 2022] and body [Alldieck et al. 2021; Deng et al. 2020; Palafox et al. 2021]. The work of Yena-mandra et al. [Yenamandra et al. 2021] was the first in this regard. They presented the first 3D morphable model of the human head, including hair. The model learns latent codes for identity, albedo, expression and hairstyle. The model, named i3DMM, is based on a SDF-based architecture learned from 3D scans of various subjects with different hairstyles and performing different expressions. The method learns a template shape and a deformation to this shape.

The template shape establishes correspondences and hence, unlike early 3DMM explicit face models [Egger et al 2020], it does not need complicated non-rigid alignment of the scans, but merely rigidly aligned ones. The method shows novel interpolation applications in the latent spaces of all components e.g. identity, expressions and hairstyle. ImFace [Zheng et al 2022] is a concurrent work to the one we present here. The aim is to improve the reconstruction accuracy of i3DMM [Yenamandra et al 2021] using a localized SDF representation. To this regard, the entire face is decomposed into 5 regions with separate networks for expression and identity learning. A meta-learning approach is used, where hyper-nets learn the weights of the expression and identity networks. Results show more accurate reconstructions over i3DMM [Yenamandra et al 2021]. An important difference between ImFace and our work is that since we aim at providing semantic control over each tooth individually, we assign one dedicated latent code to each geometric component, i.e. to each tooth and to the gums, whereas ImFace’s latent codes cannot be partitioned into distinct regions of the geometry. Also, our method is different from NASA [Deng et al 2020]: For a fixed number of joints, NASA encodes geometry as pose-conditioned occupancy. This is ill-suited for dental geometry, as single teeth might be missing, and annotating individual teeth poses and skinning weights for the dental scans in the training set would be very difficult.

贡献

首个包含牙龈的、基于隐式表达的人类牙齿参数化可变形模型。该工作为人类牙齿和牙龈的几何形状建立一个可变的模型，将牙齿分割标记成可语意描述的组件并且能够控制每一个单独的组件（e.g.,每一颗牙齿（total=14)牙龈(total=1））

💡

Bard: An (隐式模型）implicit model in machine learning is a model that is defined by a fixed-point equation. This means that the output of the model is not directly calculated from the input, but rather is found by iteratively solving the equation until a stable solution is reached. One of the main advantages of implicit models is that they can be much more expressive than traditional models. This is because they can capture complex relationships between the input and output variables that would be difficult or impossible to represent with a traditional model.

方法

我们提出了一种组合 SDF 表示，其中使用单独的模型来表示每颗牙齿和牙龈。这是一种新颖的3D模型表示手法，简单来说就是对于组合SDF中的每一个模型，每个模型都拥有一个SDF,SDF值为正表示点在模型内，反之在模型外。

我们为这 15 个组件中的每一个学习了 1 个专用的laten code。这就允许我们可以编辑后续结果，例如牙齿替换和变形！！！

模型的15个子模块代表着牙齿的15个部分，给定一个laten code以及3D空间的一个点，子模块将负责预测一个SDF值（+，-）以及一个指示器（probablity)。指示器给出了该点属于该模块的概率，基于这个概率，我们对于所有子模块的值，计算一组混合权重以将 SDF 值线性组合为一个最终值。

上文中，给定的3D空间内的一个点首先会被Deform-Net所弯曲，这个网络由以给定的laten code为条件的Hyper-Net生成，该网络将每个输入点映射到学习的规范参考空间，其中模板形状由 Ref-Net 嵌入

Deform-Net 还预测 SDF 修正 Δsi,因此最终的SDF为 Δsi+si, si由Ref-Net预测

💡

Bard: （组合SDF表示）compositional SDF representation： A compositional SDF representation is a way of representing a 3D scene as a collection of individual objects, each of which is represented by a signed distance function (SDF). The SDF of an object is a function that maps a point in space to the signed distance between that point and the object's surface. A positive value indicates that the point is inside the object, while a negative value indicates that the point is outside the object.

❓疑问：laten code到底是什么

数据

该方法是在牙齿几何数据集上(e.g. Fig1)训练的，用语义标签手动注释，将表面分割成单独的牙齿类型和牙龈

Fig1 Examples from our dataset of ground truth teeth geometries. Teeth identities have been annotated manually. We visualize them by different colours.

尽管这些模型是通过不同的方法获得的（例如，通过挖掘患者的传统牙齿印模，或更直接地通过口腔内扫描方法），我们将主要将它们称为“牙科扫描”，以避免混淆“模型”一词的各种含义。每次扫描都可以作为高分辨率网格使用，特别是允许我们获得用于监督的法向量。

数据大小

1077 个上颌骨几何形状，其中大约一半排列不齐。我们将它们随机分成 1027 个用于训练，50 个用于测试。通过左右翻转和交换标签，我们将训练集增加到 2054 个几何图形

对齐、归一化处理

由于扫描是使用各种不同的设备获取的，因此它们未在公共坐标系中对齐。因此，我们选择其中一个扫描并将其归一化以占据体积 [−1, 1]。然后我们使用广义 Procrustes 分析 (GPA)将其他扫描扫与该模板对齐。

结果

重建和语义标注

保留模型权重，求解如下最优化问题（该问题已经再以往的工作中提出，还没时间看）

模型编辑

这个应用更像是基于上面语义分割之后的拓展应用。

我们方法的主要优势在于它将牙齿几何结构分解为许多可以单独控制的语义上有意义的组件。这允许我们通过用一些更理想的对应物（例如从美学上更令人愉悦的牙齿）替换形状错误或姿势不美观的牙齿来编辑特定的重建结果。

(a) 和 (d) 分别从底视图和侧视图显示两个排列不齐的门牙（见红色虚线）。 (b)和(e) 显示了用一些对齐更好的对应物替换这两颗门牙的结果，同时保持所有其他牙齿不变。 (c)和(f) 对编辑前后的差异进行编码。请注意，原始模型没有尖牙。因此，我们选择处理这个例子来表明我们可以重建一个最初缺失牙齿的模型

文章中也提到这个应用可以更好地为患者展示多种可能的治疗结果

中间步骤插值

由于矫正错位牙齿等正畸治疗可能是一个漫长而连续的过程，因此将它们可视化为一个连续的动画可能也有一些好处。但是在这篇文章中作者只是在一个牙齿的laten code和另一个牙齿的laten code中插值，在实际的正畸用例中，根据正畸专家的知识，动画可能需要包含额外的“关键帧”。

In each row we interpolate between the reconstruction of a pre-treatment scan (first column) and the reconstruction of a post-treatment scan (last column). The arrows show the direction of interpolation. We can render plausible visualizations of orthodontic treatment plans in this way, which is best illustrated by our supplemental video results.

未来

重建下颚

重建舌头

包含纹理

自动判断缺牙（目前是通过一个向量0，1来指示）

🗒️An Implicit Parametric Morphable Dental Model