Computer Vision Newsletter #29

Westworld Vibes 🤖 Segment All the Things ✂️ Computer Vision Q&A❓️

Hello, Truth-Seekers! 👋 

Lately, it feels like we're moving rapidly towards building Westworld, just like in that classic movie and the more recent HBO series. And these past few weeks have only reinforced this feeling. 🧐 Let’s dig in!

Computer Vision Q&A ❓️ 

Last week, I invited you all to send in your computer vision questions for the ML team at Superb AI. Thank you for all your fantastic questions! As promised, I'm sharing the answers with you in today's issue.

This week’s questions:

  1. Why aren't variable-size vision networks (e.g., pure convolution + spatial pyramid pooling) more popular? Can we use vision transformers for datasets with variable-size images?

  2. What Deep Learning framework would you use to build reliable systems? Is there a framework that's more robust or easier to put into production?

Author Picks 🫡 

The First Foundational Model for Image Segmentation

sam_model_example_image

Meta introduced a groundbreaking AI model called SAM (Segment Anything Model), which has brought a new task to the table: promptable segmentation. SAM has managed to learn a general notion of what objects are and accepts multimodal prompts, such as text, key points, or bounding boxes. And the best part is that both the model (SAM) and the 1-Billion mask dataset are OPEN-sourced! Blog | Demo | GitHub | Dataset 

Westworld Vibes

Stanford AI researchers introduced an incredible new paper called "Generative Agents" and it made waves in the AI community already. Researchers placed 25 autonomous agents into a simulated town, giving them unique motivations, and observed if they would display human-like behavior. These agents scored higher on human likeness than actual humans! Paper

LifeOS: Smart Glasses to Effortlessly Navigate Your Life

Stanford students are breaking new ground with LifeOS, an AI-powered operating system designed for AR smart glasses. LifeOS employs computer vision to function as a personal AI assistant right in your AR smart glasses. The system leverages GPT-4, Apple's speech framework for transcription, and Brilliant Labs' facial recognition technology to identify faces and offer relevant information about individuals during conversations. Check out the demo video.

Learning 🤓 

Superior Image Generation Results with ControlNet → a deep dive into the working of ControlNet, how it is trained, and what kinds of image-generation capabilities we can expect from it.

Top 10 Object Detection Models in 2023 → the list of top 10 deep-learning models for object detection in 2023 with pros and cons.

Selecting CNN Architectures for Computer Vision Applications → the most popular CNN architectures and when to use them.

Deep Learning Fundamentals → a free course on learning deep learning using a modern open-source stack.

Confusion Matrix for Object Detection in one sketch ⬇️ 

confusion matrix visual

“Confusion Matrix for Object Detection”

Tools & Datasets 🛠️ 

Exploratory Data Analysis for Computer Vision → Kangas 2.0 is a tool for exploring, analyzing, and visualizing large-scale multimedia data.

LumaAI → allows you to capture any 3D object and reproduce it with unmatched photorealistic qualities. Intricate details, reflections & lighting are their specialties.

Wonder Studio → a tool that detects the actor’s performance based on single-camera footage. Then, it takes that performance and transfers it to the CG character of your choice.

Research Spotlight 🔬 

Vision Transformers with Mixed-Resolution Tokenization → a new image tokenization scheme for Vision Transformers that replaces the standard uniform grid with a mixed-resolution sequence of tokens, resulting in substantial accuracy gains on image classification when controlling for the computational budget.

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections → a generative model for synthesizing unbounded 3D scenes from in-the-wild 2D image collections using an efficient yet expressive 3D scene representation, a generative scene parameterization, and an effective renderer that can leverage knowledge from 2D images

More News & Links 🗞️ 

If you like Ground Truth, share it with a computer vision friend! If you hate it, share it with an enemy. 😉

Have a great week!

Over and out,

Dasha

Join the conversation

or to participate.