Ground Truth
Posts
Computer Vision Newsletter #41

Computer Vision Newsletter #41

👁️ State of Computer Vision 2023; 🚀 Accelerating Model Training; CNNs Visual Explainer and more

Dasha Gurova
July 11, 2023

Hello Truth-Seekers!

The mysterious absence of the Computer Vision Newsletter last week was due to the 4th of July celebrations. 🇺🇸 I was out embracing my freedom to frolic and sunburn at the beach. If you're in the US, I hope your holiday was a blast too!

Now, refreshed and as eager as a neural network facing a fresh dataset, I've gathered top-notch tutorials, resources, and GitHub repositories to boost your computer vision projects. So, let's dive in!

Author Pick[s]

AI is a lot of work [The Verge]

1. AI Is a Lot of Work → Data annotation, though essential in AI development, is frequently perceived as a passing, inconvenient prerequisite to the more glamorous task of model building among engineers. However, this process is far from straightforward or finite, and notably, it isn't well-paid. In an eye-opening article, a journalist from The Verge delves into the intricacies of the data annotation industry and the associated labor dynamics.

2. State of Computer Vision 2023 → Sebastian Rashka's exploration of the current trends in computer vision research and development is a compelling read. I touched on a similar topic in my last week's newsletter, and although our viewpoints align, Sebastian delves much deeper into specific areas. He dissects trends like vision transformers, diffusion models, GANs, and NeRFs, along with object detection and segmentation.

3. The first Machine Unlearning Challenge → Google has launched a challenge seeking a novel machine unlearning algorithm that can eliminate the impact of specific training examples from an already trained model. The objective is to devise unlearning methods that can erase particular instances while preserving the model's valuable properties. This contest will take place on Kaggle, with automated scoring based on both the quality of the forgetting process and the retained utility of the model.

Tutorials & Learning

CNN Visual Explainer

1. Interactive, Visual Explainer of Convolutional Neural Networks → An engaging visual guide to convolutional neural networks that illuminates the core principles and offers an interactive view into their operations.

2. PaddlePaddle: Exploring Object Detection, Segmentation, and Keypoints → This blog discusses the factors contributing to PaddlePaddle's impressive efficiency and speed and delves into its applicability in real-world scenarios.

3. How to Leverage Embeddings for Data Curation in Computer Vision → This is a recorded version of a recent Superb AI webinar that delves into data curation for computer vision. It takes a deep dive into the use of embeddings to curate high-quality datasets.

4. Accelerating PyTorch Model Training → This article provides insight into scaling PyTorch model training using a simple vision transformer for image classifications as an example. It emphasizes the use of mixed precision methods and multi-GPU training paradigms.

5. How to Use FastSAM → SAM’s extensive use of Vision Transformer architecture limits its practical applications, particularly in real-time scenarios. FastSAM is an open-source image segmentation model reportedly runs 50 times faster than SAM.

6. Automate Machine Learning Deployment with GitHub Actions → How to create a CD pipeline to automate machine learning workflows using GitHub Actions.

7. Kaggle Learn → Kaggle has introduced a variety of free courses aimed at providing practical skills in machine learning. With the bonus of receiving a certificate upon completion, these comprehensive courses are a great resource for anyone looking to learn.

Developer Resources

1. OpenMMLab's collection → The list of high-quality projects for a variety of tasks, including pose estimation and segmentation.

2. Vision Transformer & Attention → This repo contains a comprehensive paper list of Vision Transformer & Attention, including papers, codes, and related websites.

3. Fast Segment Anything (FastSAM) → Code for CNN-based Segment Anything Model published by SAM authors. FastSAM achieves comparable performance with the SAM method at 50× higher run-time speed.

4. DataGradients → An open-source Python-based library specifically designed for computer vision dataset analysis. It automatically extracts features from your datasets and combines them all into a single user-friendly report.

5. LEDITS: Real Image Editing → Combining text-guided diffusion models with DDPM inversion allows for sophisticated image editing capabilities without the need for additional training or external guidance

Research Spotlight

1. Segment Anything Meets Point Tracking → The paper introduces SAM-PT, an extension of the Segment Anything Model (SAM) for zero-shot tracking and segmentation in dynamic videos, utilizing robust point selection and propagation techniques for mask generation and achieving strong performance across popular video object segmentation benchmarks.

2. DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing → an extension of the DragGAN framework for interactive point-based image editing. DragDiffusion leverages large-scale pre-trained diffusion models to improve the applicability of point-based editing in real-world scenarios, achieving precise spatial control and high-quality editing results efficiently.

3. DreamDiffusion: Generating High-Quality Images from Brain EEG Signals → a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals. The method leverages pre-trained text-to-image models, employs temporal masked signal modeling to pre-train the EEG encoder, and utilizes the CLIP image encoder for extra supervision, overcoming challenges associated with using EEG signals for image generation.

4. RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation → a foundation agent for robotic manipulation, inspired by advancements in vision and language models, that can leverage diverse robotic experiences to quickly learn new skills and adapt to different robot embodiments.

Previous Issue’s 3 Most Clicked Links

MLOps Landscape in 2023: Top Tools and Platforms.
Best Computer Vision projects with source code and datasets.
Beyond NeRFs: Tips and tricks for successfully using NeRFs in the wild.

Drop me a line if you have any feedback or questions.

Sending you good vibes,

Dasha 🫶

Reply

or to participate.