ECCV 2020 Workshop: Learning 3D Representations for Shape and Appearance

1	Philipp Henzler	Learning a Neural 3D Texture Space from 2D Exemplars Abstract: We propose a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency. This is enabled by a family of methods that extend ideas from classic stochastic procedural texturing (Perlin noise) to learned, deep, non-linearities. The key idea is a hard-coded, tunable and differentiable step that feeds multiple transformed random 2D or 3D fields into an MLP that can be sampled over infinite domains. Our model encodes all exemplars from a diverse set of textures without a need to be re-trained for each exemplar. Applications include texture interpolation, and learning 3D textures from 2D exemplars. Bio: Philipp Henzler is a third year PhD student at University College London under the supervision of Tobias Ritschel and Niloy J. Mitra. His research focuses on explaining our world from 2D visual observations. More specifically, weakly supervised 3D texture synthesis and 3D reconstruction from 2D images.
2	Thu Nguyen-Phuoc	BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images Abstract: We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the computer graphics pipeline, we design BlockGAN to learn to first generate 3D features of background and foreground objects, then combine them into 3D features for the wholes cene, and finally render them into realistic images. This allows BlockGAN to reason over occlusion and interaction between objects' appearance, such as shadow and lighting, and provides control over each object's 3D pose and identity, while maintaining image realism. BlockGAN is trained end-to-end, using only unlabelled single images, without the need for 3D geometry, pose labels, object masks, or multiple views of the same scene. Our experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity). Bio: Thu Nguyen-Phuoc is a PhD student in Visual Computing at the Center for Digital Entertainment, University of Bath. An architecture student that went rogue, Thu now works at the intersection of computer vision and computer graphics. In particular, she is interested in neural rendering and inverse rendering.
3	Zhiqin Chen	BSP-Net: Generating Compact Meshes via Binary Space Partitioning Abstract: Inspired by classical Binary Space Partitioning (BSP) data structures from computer graphics, we introduce a network architecture that learns to represent a 3D shape via convex decomposition. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built on a set of planes. The convexes inferred by BSP-Net can be easily extracted to form a polygon mesh, without any need for iso-surfacing. The generated meshes are compact (i.e., low-poly) and well suited to represent sharp geometry; they are guaranteed to be watertight and can be easily parameterized. We also show that the reconstruction quality by BSP-Net is competitive with state-of-the-art methods while using much fewer primitives. Bio: Zhiqin Chen is a first-year Ph.D. student at Simon Fraser University, under the supervision of Prof. Hao (Richard) Zhang. His research interest is computer graphics with specialty in geometric modeling and machine learning.
4	Songyou Peng	Convolutional Occupancy Networks Abstract: Recently, implicit neural representations have gained popularity for learning-based 3D reconstruction. While demonstrating promising results, most implicit approaches are limited to comparably simple geometry of single objects and do not scale to more complicated or large-scale scenes. The key limiting factor of implicit methods is their simple fully-connected network architecture which does not allow for integrating local information in the observations or incorporating inductive biases such as translational equivariance. In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space. We investigate the effectiveness of the proposed representation by reconstructing complex geometry from noisy point clouds and low-resolution voxel representations. We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data. Bio: Songyou is a PhD student at ETH Zurich and Max Planck Institute for Intelligent Systems, advised by Prof. Marc Pollefeys and Prof. Andreas Geiger. His research interest lies in computer vision and machine learning, especially neural scene representations.
5	Francis Williams	Deep Geometric Prior for Surface Reconstruction Abstract: In this talk, I cover the results of three recent papers which leverage the smooth inductive biases of ReLU networks to reconstruct surfaces from point clouds: “Deep Geometric Prior for Surface Reconstruction” constructs a manifold atlas consisting of parametric patches encoded as a fully connected ReLU network. This work empirically demonstrates that the inductive bias of ReLU networks leads to good solutions on surface reconstruction outperforming other state of the art methods. “Gradient Dyanics of Shallow Univariate Networks” formally analyzes this inductive bias in the case of curves, and shows that under some initializations, shallow ReLU networks behave as kernel machines, minimize curvature, and are equivalent to cubic splines. Inspired by these theoretical results, “Neural Splines: Fitting 3D Surfaces with infinitely wide neural networks” uses the kernels arising from shallow ReLU networks to represent an implicit function. This simple linear model outperforms state of the art traditional methods and neural network based methods, and suggests that the success of neural implicit methods may be attributed to the kernel behavior of neural networks. Bio: Francis Williams is a fourth year PhD student at New York University advised by Denis Zorin and Joan Bruna. Francis’s work lies at the intersection of machine learning, computer vision, and computer graphics. Recently, Francis has been focusing on understanding and leveraging the inductive biases of neural networks to solve geometric problems in a principled manner. In addition to research, Francis actively maintains several open source libraries including Point Cloud Utils, NumpyEigen and FML.
6	Lior Yariv	Multiview Neural Surface Reconstruction with Implicit Lighting and Material Abstract: In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail." Bio: I am a MSc student in the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science under the supervision of Prof. Yaron Lipman. My main fields of interest are computer graphics, computer vision and machine learning. I have been working on developing 3D deep learning methods, mostly focusing on learning with weak supervision.
7	Ben Mildenhall	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Abstract: In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail." Bio: Ben is final year PhD student working with Ren Ng in the EECS department at UC Berkeley. He works on problems in computer vision and graphics, and previously interned in Marc Levoy's group in Google Research as well as with Rodrigo Ortiz-Cayon and Abhishek Kar at Fyusion. Ben did his undergrad at Stanford University and worked at Pixar Research in the summer of 2014.
8	Jiahui Lei	Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images Abstract: We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views. Previous work on learning shape reconstruction from multiple views uses discrete representations such as point clouds or voxels, while continuous surface generation approaches lack multi-view consistency. We address these issues by designing neural networks capable of generating high-quality parametric 3D surfaces which are also consistent between views. Furthermore, the generated 3D surfaces preserve accurate image pixel to 3D surface point correspondences, allowing us to lift texture information to reconstruct shapes with rich geometry and appearance. Our method is supervised and trained on a public dataset of shapes from common object categories. Quantitative results indicate that our method significantly outperforms previous work, while qualitative results demonstrate the high quality of our reconstructions. Bio: Jiahui Lei is an incoming doctoral student at the University of Pennsylvania. He obtained his bachelor degree from Zhejiang University in 2020 with honour. Previously, he did an intern in Geometric Computation Group, Stanford University.
9	Vincent Sitzmann	Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations Abstract: Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the 3D structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to feature representations of local scene properties. By formulating the image formation as differentiable ray-marching, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth / shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. This enables novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model. Bio: Vincent has just finished his PhD at Stanford University and is now a Postdoc at MIT's CSAIL with Josh Tenenbaum, Bill Freeman, and Fredo Durand. His research interest lies in neural scene representations - the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.
10	Michael Oechsle	Texture Fields: Learning Texture Representations in Function Space Abstract: Texture Reconstruction of 3D objects has received little attention from the research community and existing methods are either limited to comparably low resolution or constrained experimental setups. A major reason for these limitations is that common representations of texture are inefficient or hard to interface for modern deep learning techniques. We propose Texture Fields, a novel implicit texture representation that is based on regressing a continuous 3D function parameterized with a neural network. Our approach circumvents limiting factors like shape discretization and parameterization, as the proposed texture representation is independent of the shape representation of the 3D object. We show that Texture Fields are able to represent high-frequency texture and naturally blend with modern deep learning techniques. Experimentally, we find that Texture Fields compare favorably to state-of-the-art methods for conditional texture reconstruction of 3D objects and enable learning of probabilistic generative models for texturing unseen 3D models. Bio: Michael Oechsle received his master degree in Physics at the University of Stuttgart. In 2017 he joined the group of Prof. Andreas Geiger at the Max-Planck Institute for Intelligent Systems and the ETAS GmbH as a PhD Student. His research focuses on investigating novel 3D representations of geometry and appearance.
11	Thibault Groueix	AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation Abstract: We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation. Bio: Thibault Groueix worked in the Imagine group of Ecole des Ponts ParisTech under the supervision of Mathieu Aubry. He worked in close collaboration with Adobe research, co-supervised by Mathew Fisher, Bryan Russel abd Vova Kim. He will join Naver Labs as a research scientist in September 2020. He introduced a novel method to parameterize 3D data (AtlasNet). Using this new parameterization for 3D shape synthesis, he developed analysis-by-synthesis methods to learn 3D shape correspondences (3D-CODED), 3D parts discovery (AtlasNetv2), and clustering (DTI-Clustering).
12	Guandao Yang	PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows Abstract: As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Specifically, we learn a two-level hierarchy of distributions where the first level is the distribution of shapes and the second level is the distribution of points given a shape. This formulation allows us to both sample shapes and sample an arbitrary number of points from a shape. Our generative model, named PointFlow, learns each level of the distribution with a continuous normalizing flow. The invertibility of normalizing flows enables the computation of the likelihood during training and allows us to train our model in the variational inference framework. Empirically, we demonstrate that PointFlow achieves state-of-the-art performance in point cloud generation. We additionally show that our model can faithfully reconstruct point clouds and learn useful representations in an unsupervised manner. The code is available at https://github.com/stevenygd/PointFlow. Bio: I'm a Computer Science PhD student at Cornell University, advised by Serge Belongie and Bharath Hariharan. My research interests include computer vision for augmented reality and 3D generation.
13	JJ Park	DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to rep- resenting 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and com- pression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) rep- resentation of a class of shapes that enables high qual- ity shape representation, interpolation and completion from partial and noisy 3D input data. DeepSDF, like its clas- sical counterpart, represents a shape’s surface by a con- tinuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape’s boundary as the zero-level-set of the learned func- tion while explicitly representing the classification of space as being part of the shapes’ interior or not. While classical SDF’s both in analytical or discretized voxel form typically represent the surface of a single shape, DeepSDF can repre- sent an entire class of shapes. Furthermore, we show state- of-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work. Bio: I'm a PhD student at the University of Washington CSE, where I work with Steve Seitz. I'm broadly interested in computer vision and graphics. My current research focus is on 3D reconstruction and realistic rendering using physics and neural representations. I am fortunate to work with awesome collaborators including Richard Newcombe and Qi Shan, labmates at GRAIL, and mentors and friends I met during my internship at Apple, Oculus, and Adobe. Prior to PhD, I received B.S. from Caltech, working at Pietro Perona's vision lab.
14	Mikaela Angelina Uy	Deformation-Aware 3D Model Embedding and Retrieval Abstract: 3D model retrieval is a fundamental operation to recover a clean and complete 3D model from a noisy and partial 3D scan. However, given a finite collection of 3D shapes, even the closest model to a query may not be satisfactory. This motivates us to apply 3D deformation techniques from the retrieved model to the query, which is still not able to achieve the perfect fitting due to certain restrictions that preserve important features of the original model. This gap between the deformed model and the query induces asymmetric relationships among the models, which cannot be handled by typical metric learning techniques. Thus, we propose a novel deep embedding approach that learns the asymmetric relationships by leveraging location-dependent egocentric distance fields. We also propose two strategies for training the embedding network. We demonstrate that both of these approaches outperform other baselines in our experiments with both synthetic and real data. Bio: Mikaela is a first year CS PhD student at Stanford University advised by Prof. Leonidas Guibas. Her research interest focus on 3D shape understanding, shape analysis and geometric processing.