Computer Vision Newsletter #32

Hinton Quits; Visual Prompting; Self-Supervised Learning Cookbook; Segment Everything Everywhere All at Once

gt 32 wordcloud

Last Week in Computer Vision #32 WordCloud

As vision foundation models continue to make waves, Meta AI's marvel - SAM - finds itself facing a new competitor: SEEM (Segment Everything Everywhere All at Once). Self-supervised learning and Visual Prompting are gaining traction as the hottest trends in computer vision. Meanwhile, AI luminary Dr. Hinton has just announced his departure from Google and joins the ranks of AI critics. The plot thickens! 😅 

AUTHOR PICKS 👩‍💻 

midjourney cookware

Generated with Midjourney by the author

Self-Supervised Learning now fuels advanced models in language, audio, and innovative computer vision models (SEER, DINOv2). Yet, training SSL resembles gourmet cooking - a complex art with a high entry barrier. Meta AI released "Cookbook of Self-Supervised Learning," a hands-on guide for AI researchers and practitioners to explore SSL recipes, grasp its intricacies, and delve into SSL's undiscovered potential. It is a must-read!

2. Visual Prompting With Andrew Ng → Last week, Andrew Ng hosted a live stream discussing the emerging concept of Visual Prompting and the first results from Landing AI. Visual Prompting applies text-prompting techniques from NLP to computer vision tasks, enabling users to create a "visual prompt" and quickly generate a working model in just seconds. Users can then edit the prompt and iterate until satisfied with the model's performance.

3. ‘The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead Dr. Hinton said he has quit his job at Google, where he has worked for more than a decade and became one of the most respected voices in the field, so he can freely speak out about the risks of A.I. A part of him, he said, now regrets his life’s work. 😳 

LEARNING & INSIGHTS 🤓 

a16z image

Credit: a16z

1. Navigating the High Cost of AI Compute → The generative AI boom is primarily driven by compute power, as more resources directly result in better products. This post by a16z tries to break down the cost factors for an AI company in order to help understand the current landscape.

2. A Gentle Introduction to YOLOv8 → the basics of YOLOv8 explained, setting up your machine for YOLOv8 and creating a custom object tracker with YOLOv8.

3. Distributed Training: Frameworks and Tools → overview of distributed training and the best frameworks and tools for it.

4. Deep Learning with PyTorch (Mini-Course) → mini-course to help you discover applied deep learning in Python with the easy-to-use and powerful PyTorch library.

5. Organizational structure for effective MLOps → how to structure organizations for effective MLOps practice implementation.

6. Leveraging Embeddings and Clustering Techniques in Computer Vision → this tutorial explores the use of embeddings in computer vision by examining image clusters, assessing dataset quality, and identifying image duplicates.

RESEARCH SPOTLIGHT 🔬 

segment anything anywhere

Segment Everything Everywhere All at Once

Meet SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. SEEM, can perform any segmentation task. When no prompt is given, it can do classic segmentation tasks such as semantic, instance, and panoptic segmentation in open-set scenarios. SEEM also supports various types of prompts and any combination of these prompts.

2. Synthetic Data from Diffusion Models Improves ImageNet Classification → paper demonstrates that large-scale text-to-image diffusion models can be fine-tuned for class-conditional models, enhancing the ImageNet training set and significantly improving classification accuracy over strong ResNet and Vision Transformer baselines.

3. AutoTaskFormer: Searching Vision Transformers for Multi-task Learning → paper proposes a one-shot neural architecture search framework, AutoTaskFormer, to automate the process of designing multi-task vision transformers, which not only identifies weights to share across tasks but also provides a wide range of pre-trained models for deployment under various resource constraints, and shows its superiority in multi-task learning over state-of-the-art handcrafted vision transformers on various datasets.

DEV TOOLS & DATASETS 🛠️ 

deepfloyd if image

1. DeepFloyd IF → a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and can render text pretty decently.

2. Track-Anythinga flexible and interactive tool for video object tracking and segmentation. It is developed upon Meta AI’s SAM, and can specify anything to track and segment via user clicks only.

3. Computer Vision tutorialsa collection of Jupyter Notebooks showcasing how to use popular computer vision models.

4. awesome-production-machine-learning → a curated list of awesome open-source libraries that will help you deploy, monitor, version, scale and secure your production machine learning.

5. segment-geospatial → python package to simplify the process of leveraging SAM for geospatial data analysis by enabling users to achieve this with minimal coding effort.

Newsy bits ⚡️

To wrap things up, there's a highly satisfying video of a Cruise robotaxi seamlessly navigating its way through the streets of San Francisco. 😍 

Drop me a line if you have any feedback or questions.

Sending you good vibes,

Dasha

Join the conversation

or to participate.