diff --git a/docs/tutorials/dataloaders_ShapeNetCore_R2N2.ipynb b/docs/tutorials/dataloaders_ShapeNetCore_R2N2.ipynb new file mode 100644 index 000000000..2e48c9b44 --- /dev/null +++ b/docs/tutorials/dataloaders_ShapeNetCore_R2N2.ipynb @@ -0,0 +1,536 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Dataloaders for ShapeNetCore and R2N2\n", + "This tutorial shows how to:\n", + "- Load models from ShapeNetCore and R2N2 using PyTorch3D's data loaders.\n", + "- Pass the loaded datasets to `torch.utils.data.DataLoader`.\n", + "- Render ShapeNetCore models with PyTorch3D's renderer.\n", + "- Render R2N2 models with the same orientations as the original renderings in the dataset.\n", + "- Visualize R2N2 model voxels." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 0. Install and import modules" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If `torch`, `torchvision` and `pytorch3d` are not installed, run the following cell:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install torch torchvision\n", + "!pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import torch\n", + "\n", + "from pytorch3d.datasets import (\n", + " R2N2,\n", + " ShapeNetCore,\n", + " collate_batched_meshes,\n", + " render_cubified_voxels,\n", + ")\n", + "from pytorch3d.renderer import (\n", + " OpenGLPerspectiveCameras,\n", + " PointLights,\n", + " RasterizationSettings,\n", + " TexturesVertex,\n", + " look_at_view_transform,\n", + ")\n", + "\n", + "from pytorch3d.structures import Meshes\n", + "from torch.utils.data import DataLoader\n", + "\n", + "# add path for demo utils functions \n", + "import sys\n", + "import os\n", + "sys.path.append(os.path.abspath(''))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If using **Google Colab**, fetch the utils file for plotting image grids:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!wget https://raw.githubusercontent.com/facebookresearch/pytorch3d/master/docs/tutorials/utils/plot_image_grid.py\n", + "from plot_image_grid import image_grid" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "OR if running locally uncomment and run the following cell:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# from utils import image_grid" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Load the datasets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you haven't already downloaded the ShapeNetCore dataset, first do that following the instructions here: https://www.shapenet.org/. ShapeNetCore is a subset of the ShapeNet dataset. In PyTorch3D we support both version 1 (57 categories) and version 2 (55 categories).\n", + "\n", + "Then modify `SHAPENET_PATH` below to you local path to the ShapeNetCore dataset folder. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Setup\n", + "if torch.cuda.is_available():\n", + " device = torch.device(\"cuda:0\")\n", + " torch.cuda.set_device(device)\n", + "else:\n", + " device = torch.device(\"cpu\")\n", + " \n", + "SHAPENET_PATH = \"\"\n", + "shapenet_dataset = ShapeNetCore(SHAPENET_PATH)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The R2N2 dataset can be downloaded using the instructions here: http://3d-r2n2.stanford.edu/. Look at the links for `ShapeNetRendering` and `ShapeNetVox32`. The R2N2 dataset contains 13 categories that are a subset of the ShapeNetCore v.1\n", + "dataset. The R2N2 dataset also contains its own 24 renderings of each object and voxelized models.\n", + "\n", + "Then modify `R2N2_PATH` and `SPLITS_PATH` below to your local R2N2 dataset folder path and splits file path respectively. Here we will load the `train` split of R2N2 and ask the voxels of each model to be returned." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "R2N2_PATH = \"\"\n", + "SPLITS_PATH = \"None\"\n", + "r2n2_dataset = R2N2(\"train\", SHAPENET_PATH, R2N2_PATH, SPLITS_PATH, return_voxels=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can retrieve a model by indexing into the loaded dataset. For both ShapeNetCore and R2N2, we can examine the category this model belongs to (in the form of a synset id, equivalend to wnid described in ImageNet's API: http://image-net.org/download-API), its model id, and its vertices and faces." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "shapenet_model = shapenet_dataset[6]\n", + "print(\"This model belongs to the category \" + shapenet_model[\"synset_id\"] + \".\")\n", + "print(\"This model has model id \" + shapenet_model[\"model_id\"] + \".\")\n", + "model_verts, model_faces = shapenet_model[\"verts\"], shapenet_model[\"faces\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use its vertices and faces to form a `Meshes` object which is a PyTorch3D datastructure for working with batched meshes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_textures = TexturesVertex(verts_features=torch.ones_like(model_verts, device=device)[None])\n", + "shapenet_model_mesh = Meshes(\n", + " verts=[model_verts.to(device)], \n", + " faces=[model_faces.to(device)],\n", + " textures=model_textures\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With R2N2, we can further examine R2N2's original renderings. For instance, if we would like to see the second and third views of the eleventh objects in the R2N2 dataset, we can do the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r2n2_renderings = r2n2_dataset[10,[1,2]]\n", + "image_grid(r2n2_renderings.numpy(), rows=1, cols=2, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Use the datasets with `torch.utils.data.DataLoader`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Training deep learning models, usually requires passing in batches of inputs. The `torch.utils.data.DataLoader` from Pytorch helps us do this. PyTorch3D provides a function `collate_batched_meshes` to group the input meshes into a single `Meshes` object which represents the batch. The `Meshes` datastructure can then be used directly by other PyTorch3D ops which might be part of the deep learning model (e.g. `graph_conv`).\n", + "\n", + "For R2N2, if all the models in the batch have the same number of views, the views, rotation matrices, translation matrices, intrinsic matrices and voxels will also be stacked into batched tensors.\n", + "\n", + "**NOTE**: All models in the `val` split of R2N2 have 24 views, but there are 8 models that split their 24 views between `train` and `test` splits, in which case `collate_batched_meshes` will only be able to join the matrices, views and voxels as lists. However, this can be avoided by laoding only one view of each model by setting `return_all_views = False`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 12\n", + "r2n2_single_view = R2N2(\"train\", SHAPENET_PATH, R2N2_PATH, SPLITS_PATH, return_all_views=False, return_voxels=True)\n", + "r2n2_loader = DataLoader(r2n2_single_view, batch_size=batch_size, collate_fn=collate_batched_meshes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's visualize all the views (one for each model) in the batch:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "it = iter(r2n2_loader)\n", + "r2n2_batch = next(it)\n", + "batch_renderings = r2n2_batch[\"images\"] # (N, V, H, W, 3), and in this case V is 1.\n", + "image_grid(batch_renderings.squeeze().numpy(), rows=3, cols=4, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Render ShapeNetCore models with PyTorch3D's differntiable renderer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both `ShapeNetCore` and `R2N2` dataloaders have customized `render` functions that support rendering models by specifying their model ids, categories or indices using PyTorch3D's differentiable renderer implementation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Rendering settings.\n", + "R, T = look_at_view_transform(1.0, 1.0, 90)\n", + "cameras = OpenGLPerspectiveCameras(R=R, T=T, device=device)\n", + "raster_settings = RasterizationSettings(image_size=512)\n", + "lights = PointLights(location=torch.tensor([0.0, 1.0, -2.0], device=device)[None],device=device)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we will try to render three models by their model ids:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "images_by_model_ids = shapenet_dataset.render(\n", + " model_ids=[\n", + " \"13394ca47c89f91525a3aaf903a41c90\",\n", + " \"14755c2ee8e693aba508f621166382b0\",\n", + " \"156c4207af6d2c8f1fdc97905708b8ea\",\n", + " ],\n", + " device=device,\n", + " cameras=cameras,\n", + " raster_settings=raster_settings,\n", + " lights=lights,\n", + ")\n", + "image_grid(images_by_model_ids.cpu().numpy(), rows=1, cols=3, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Suppose we would like to render the first three models in the dataset, we can render models by their indices:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "images_by_idxs = shapenet_dataset.render(\n", + " idxs=list(range(3)),\n", + " device=device,\n", + " cameras=cameras,\n", + " raster_settings=raster_settings,\n", + " lights=lights,\n", + ")\n", + "image_grid(images_by_idxs.cpu().numpy(), rows=1, cols=3, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, if we are not interested in any particular models but would like see random models from some specific categories, we can do that by specifying `categories` and `sample_nums`. For example, if we would like to render 2 models from the category \"faucet\" and 3 models from the category \"chair\", we can do the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "images_by_categories = shapenet_dataset.render(\n", + " categories=[\"faucet\", \"chair\"],\n", + " sample_nums=[2, 3],\n", + " device=device,\n", + " cameras=cameras,\n", + " raster_settings=raster_settings,\n", + " lights=lights,\n", + ")\n", + "image_grid(images_by_categories.cpu().numpy(), rows=1, cols=5, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we are not interested in any particular categories and just would like to render some random models from the whole dataset, we can set the number of models to be rendered in `sample_nums` and not specify any `categories`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "random_model_images = shapenet_dataset.render(\n", + " sample_nums=[3],\n", + " device=device,\n", + " cameras=cameras,\n", + " raster_settings=raster_settings,\n", + " lights=lights,\n", + ")\n", + "image_grid(random_model_images.cpu().numpy(), rows=1, cols=5, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Render R2N2 models with the same orientations as the original renderings in the dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can render R2N2 models the same way as we rendered ShapeNetCore models above. In addition, we can also render R2N2 models with the same orientations as the original renderings in the dataset. For this we will use R2N2's customized `render` function and a different type of PyTorch3D camera called `BlenderCamera`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, we will render the seventh model with the same orientations as its second and third views. First we will retrieve R2N2's original renderings to compare with the result." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "original_rendering = r2n2_dataset[6,[1,2]][\"images\"]\n", + "image_grid(original_rendering.numpy(), rows=1, cols=2, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will visualize PyTorch3d's renderings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r2n2_oriented_images = r2n2_dataset.render(\n", + " idxs=[6],\n", + " view_idxs=[1,2],\n", + " device=device,\n", + " raster_settings=raster_settings,\n", + " lights=lights,\n", + ")\n", + "image_grid(r2n2_oriented_images.cpu().numpy(), rows=1, cols=2, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Visualize R2N2 models' voxels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "R2N2 dataloader also returns models' voxels. We can visualize them by utilizing R2N2's `render_vox_to_mesh` function. This will cubify the voxels to a Meshes object, which will then be rendered." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we will visualize the tenth model in the dataset with the same orientation of its second and third views. First we will retrieve R2N2's original renderings to compare with the result." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r2n2_model = r2n2_dataset[9,[1,2]]\n", + "original_rendering = r2n2_model[\"images\"]\n", + "image_grid(original_rendering.numpy(), rows=1, cols=2, rgb=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will pass the voxels to `render_vox_to_mesh`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vox_render = render_cubified_voxels(r2n2_model[\"voxels\"], device=device)\n", + "image_grid(vox_render.cpu().numpy(), rows=1, cols=2, rgb=True)" + ] + } + ], + "metadata": { + "anp_metadata": { + "path": "fbsource/fbcode/vision/fair/pytorch3d/docs/tutorials/Dataloaders_ShapeNetCore_R2N2.ipynb" + }, + "bento_stylesheets": { + "bento/extensions/flow/main.css": true, + "bento/extensions/kernel_selector/main.css": true, + "bento/extensions/kernel_ui/main.css": true, + "bento/extensions/new_kernel/main.css": true, + "bento/extensions/system_usage/main.css": true, + "bento/extensions/theme/main.css": true + }, + "disseminate_notebook_info": { + "backup_notebook_id": "669429066983805" + }, + "kernelspec": { + "display_name": "intro_to_cv", + "language": "python", + "name": "bento_kernel_intro_to_cv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5+" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}