When SAM Meets NeRF: This AI Model Can Segment Anything in 3D

Mai 22, às 08:44


5 min de leitura


0 leituras

We are all amazed by the generative AI advancements recently, but that does not mean we do not get any significant breakthroughs in other applications. For example, the computer vision domain has been seeing...
When SAM Meets NeRF: This AI Model Can Segment Anything in 3D

We are all amazed by the generative AI advancements recently, but that does not mean we do not get any significant breakthroughs in other applications. For example, the computer vision domain has been seeing relatively rapid advancements recently as well. The Segment Anything Model (SAM) release by Meta was a huge success and changed the game in 2D image segmentation entirely. 

In image segmentation, the goal is to detect and sort of “paint” all the objects in the scene. Usually, this is done by training a model on a dataset of objects we want to segmentize. Then, we can use the model to segment the very objects in different images. However, the main problem here is that the model is bounded by the objects we show it during the training; and it cannot segmentize unseen objects.

With SAM, this is changed. SAM is the first model that could segmentize anything, literally. This is achieved by training the SAM on large-scale data and giving it the ability to perform zero-shot segmentation across various styles of image data. It is designed to automatically segment objects of interest in images, regardless of their shape, size, or appearance. SAM has demonstrated remarkable performance in segmenting objects in 2D images, revolutionizing the field of computer vision.

Of course, people did not simply stop there. They started working on ways to extend SAM’s capabilities beyond 2D. However, a key question has remained unanswered: Can SAM’s segmentation ability be extended to 3D, thereby bridging the gap between 2D and 3D perception caused by data scarcity? The answer is looking like yes, and it is time to meet with SA3D.

SA3D leverages advancements in Neural Radiance Fields (NeRF) and the SAM model to revolutionize 3D segmentation. NeRF has emerged as one of the most popular 3D representations in recent years. NeRF builds connections between sparse 2D images and real 3D points through differentiable volume rendering. It has seen numerous improvements, making it a powerful tool for tackling the challenges of 3D perception.

There have been some attempts to extend NeRF-based techniques for 3D segmentation. These approaches involved training an additional feature field aligned with a pre-trained 2D visual backbone. While effective, these methods suffer from limitations such as high memory footprint, artifacts in radiance fields affecting feature fields, and inefficiency due to the need for training an additional feature field for every scene.

This is where SA3D comes into play. Unlike previous methods, SA3D does not require training an additional feature field. Instead, it leverages the power of SAM and NeRF to segment desired objects from all views automatically.

SA3D works by taking user-specified prompts from a single rendered view to initiate the segmentation process. The segmentation maps generated by SAM are then projected onto 3D mask grids using density-guided inverse rendering, providing initial 3D segmentation results. To refine the segmentation, incomplete 2D masks from other views are rendered and used as cross-view self-prompts. These masks are fed into SAM to generate refined masks, which are then projected onto the 3D mask grids. This iterative process allows for the generation of complete 3D segmentation results.

Overview of how SA3D works. Source:

SA3D offers several advantages over previous approaches. It can easily adapt to any pre-trained NeRF model without the need for changes or re-training, making it highly compatible and adaptable. The entire segmentation process with SA3D is efficient, taking approximately two minutes without requiring engineering optimization. This speed makes SA3D a practical solution for real-world applications. Moreover, experimental results have demonstrated that SA3D can generate fine-grained segmentation results for various types of 3D objects, opening up new possibilities for applications such as robotics, augmented reality, and virtual reality.

Check out the Paper, Project, and Github link. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club

Ekrem Çetinkaya

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.

Continue lendo


React MUI Content Security Policy
Material-UI is a user interface library that provides predefined and customizable React components for faster and easy web development, these Material-UI components are based on top of Material Design by...

Hoje, às 19:00

Tech Crunch

Taking the pulse on the Northeast seed market with Techstars' Kerty Levy
Techstars’ Kerty Levy knows a thing or two about where seed funding is, and where it might be going, in the Northeast. During a presentation at TechCrunch’s Early Stage in Boston last month, Levy took a brief...

Hoje, às 15:00

Hacker News

Mastering CSS Blend Modes
CSS mix blend modes provide an easy, yet powerful way to create visually interesting designs. Visual effects galore The modes allow you to manipulate how elements interact with each other. Which can lead in...

Hoje, às 14:18

AI | Techcrunch

3 Views on a16z's latest reported early-stage effort
a16z, a venture capital firm known for its large fund sizes and for shaking up the VC game when it piled into the industry back in 2009, is cooking up a new strategy to potentially bolster its deal flow,...

Hoje, às 14:00


Dúvida sobre os "níveis" de experiência · Vnj
Olá galera, estou estudando programação ja faz uns meses e me veio a seguinte dúvida: o que te faz ser um júnior? Pelo que li em alguns posts daqui, seria ter algum contato com o mercado...

Hoje, às 13:15

Hacker News

ARM’s Cortex A53: Tiny But Important
Tech enthusiasts probably know ARM as a company that develops reasonably performant CPU architectures with a focus on power efficiency. Product lines like the Cortex A7xx and Cortex X series use we…

Hoje, às 09:12


Save up to 70% on cables, power stations, and more in this Memorial Day sale
This is a great time to upgrade your charging setup. The following content is brought to you by Mashable partners. If you buy a product featured here, we may earn an affiliate commission or other...

Hoje, às 09:00

Tech Crunch

Startups should absolutely work with governments to support defense projects
Maëlle Gavet is the CEO of Techstars and was previously a senior executive at numerous large tech companies around the world. In these times of heightened tensions and global volatility, I believe startups...

Hoje, às 08:30


How to Track Gumroad Sales in Notion Using Notion API and Python
Introduction In this tutorial, you’ll learn how to track Gumroad1 sales in real-time in Notion2 using 🐍 Python. You will also learn, What are APIs? How to use Gumroad API? How to use Notion API? How run a...

Hoje, às 04:24


Como criar um git/github (e as primeras configs) obs: no windows e com o vscode · NicolasdevNx
Olá, este "artigo" tem como objetivo ensinar como baixar e usar o git eo o github(para este não é neseçario o dowload) então vomos lá. 1:Acesse o site escolh...

Hoje, às 02:32