Last Week in Computer Vision #24

How good are ViTs at finding objects? 🧐 Is TensorFlow dying? πŸ’‘ Performance Benchmark of YOLO v5, v7 and v8

I came across this Reddit post a few days ago asking: β€œWhat is "the ChatGPT" for computer vision?” The poster explained that β€œAfter witnessing the incredible developments in generative AI of the last year (DALL-E, Stable Diffusion, GPT3, ChatGPT, ...), it feels like the field of computer vision is "lagging behind" and has yet to go through a similar level of breakthroughs. What do you think about the state of CV?β€œ

This post received backlash from many in the community, myself included. "Lagging behind?" Did none of the deep learning techniques developed and popularized in the computer vision field contribute to image generation AI? The poster did admit to being a noob, but it still felt like a trend-riding post. I guess they didn’t know better than to post this kind of question on r/computervision. Btw, the post was deleted one day later.

While ChatGPT is an impressive achievement and a turning point in AI adoption, the hype surrounding it can be overwhelming at times (at least to me personally). It became even harder to dig out CV content, as people are pivoting really fast.

As a computer vision practitioner or learner, do you feel like you are missing out? Do you feel ChatGPT fatigue or thinking to pivot? Hit Reply and share your thoughts. I will keep your secret safe, I promise. πŸ˜‡ 

If you like Ground Truth, share it with a computer vision friend! If you hate it, share it with an enemy. πŸ˜‰ 

Author Picks

Vit_image

Image Credits: Gradient Ascent

🀌 Vision Transformers: From Idea to Applications Part III β†’ Object detection is a challenging problem. It deals with "what" an object is and "where" it is in an image. How good are vision transformers at finding objects?

🀌 Gen-1: The Next Step Forward for Generative AI β†’ Realistically and consistently synthesize new videos by applying the composition and style of an image or text prompt to the structure of your source video.

🀌 Life vs. ImageNet: Lessons from bringing computer vision to the real world β†’ developers got together to take a look at real-world ML from multiple angles, from how to best start ML projects to the challenges of scaling ML products and teams.

🀌 From Pixels to Paintings: The Rise of Midjourney AI Art β†’ what Midjourney is, how it works, and how you can access and use it to explore your creative side.

Learnings & Insights

🧐 Why TensorFlow for Python is dying a slow death β†’ PyTorch vs TensorFlow debate. Both camps have troves of supporters. And both camps have good arguments to suggest why their favorite deep-learning framework might be the best.

πŸ€“ GPU Cloud Server Comparison β†’ cloud GPU vendor pricing assembled into tables, sortable and filterable to your liking.

πŸ€“ Performance Benchmark of YOLO v5, v7 and v8 β†’ YOLO v5 vs YOLO v7 vs YOLO v8 object detection models were run head-to-head on Jetson AGX Orin and RTX 4070 Ti to find the ones with the best speed-to-accuracy balance.

πŸ€“ Distributed Training: Errors to Avoid β†’ distributed training in ML is complex and error-prone, with many hidden pitfalls that can cause huge issues in the model training process. This article touches on ten of the most common errors in distributed model training and suggests solutions to each of them.

Developer Tools & Datasets

βš™οΈ Kaggle Models β†’ Kaggle now has a model zoo, developers can now discover and use pre-trained models through deep integrations with the rest of Kaggle’s platform.

βš™οΈ Computer Vision starter Kaggle Notebooks β†’ a collection of Kaggle notebook examples for various computer vision tasks.

βš™οΈ The Dollar Street Dataset β†’ a collection of 38,479 images of everyday household items from homes around the world that visually captures the socioeconomic diversity of traditionally underrepresented populations.

βš™οΈ awesome-open-data-centric-ai β†’ Open source tooling for data-centric AI on unstructured data.

Research Spotlight

domain_tuning

πŸ”¬ Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models β†’ Text-to-image personalization aims to teach a pre-trained diffusion model to reason about the novel, user-provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, researchers propose an encoder-based domain-tuning approach.

πŸ”¬ Deep Learning on a Data Diet β†’ recent research from Stanford and Meta AI developed a new simpler approach to unsupervised learning. DIET has three main benefits: minimal code refactoring, architecture independence, and no additional hyperparameters. Experimental results demonstrate that DIET can rival current SOTA methods on the CIFAR100 and TinyImageNet benchmarks.

πŸ”¬ Dropout Reduces Underfitting β†’ in this paper Facebook research proposed early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.

Upcoming Events

Building Data-Centric Workflows for Computer Vision β†’ If you are looking for a tool to accelerate the labeling and management of your computer vision datasets, consider attending this webinar on March 15th, 2023. Superb AI will showcase some of its AI-powered features, including Active Learning, Auto Labeling, Mislabel Detection, and Dataset Curation.

event_banner

Have a great week!

Over and out,

Dasha

Reply

or to participate.