Last Week in Computer Vision #21

Averting AI Doom; Deploying CV models; YOLOv8 on custom dataset...

The past few weeks were a busy blur for me. But I'm happy to report that the Ground Truth computer vision newsletter officially migrated over to Beehiiv. This should provide a better experience for you all, and I can't wait to see how it goes!

Many thanks to all my subscribers who have been with me since the beginning - I'm so lucky to have you! And a very warm welcome to all the new subscribers. I'm so glad to have you on board! 🤗

Some highlights from this issue: Averting AI Doom by Not Panicking About Imaginary AI Doom; Need for AI leaders; Deploying Computer Vision Models: Tools & Best Practices; YOLOv8 on custom dataset tutorial; Next-Generation Convolutional Neural Networks; Understanding Deep Learning book; Computer Vision with Hugging Face Transformers and much more.

Author Picks

Article feature image

💡 Let's Speed Up AI → Daniel Jeffries argues that in order to make AI better, it must be put out into the real world and tested, rather than trying to fix problems in isolation. People are creative and will find ways to exploit and misuse a system. That shouldn’t stop us from creating, improving, and using new technologies. There will be problems. There always are. But we'll do what we always do. We'll solve them.

🤔 Is there a difference between AI leaders and traditional software leaders? → Hussein Mehanna, Head of AI at Cruise, argues that there is. One major issue he sees is the shortage of AI executives and leaders who are skilled in delivering large-scale enterprise AI products and technologies. This leads in turn to AI programs being driven by either traditional software executives or academics who move into the industry. Sometimes, these people deliver positive results, but there are differences between experienced AI leaders and their conventional software counterparts and leaders from AI in academia.

🔬 Practicing AI research Doing research is a skill that can be learned through practice, much like sports or music. But what sets good and bad researchers apart? It's their proficiency in these four areas: (1) idea conception and selection, (2) experiment design and execution, (3) writing the paper, and (4) maximizing impact.

Always be Learning

🎓 Train YOLOv8 on Custom Dataset – TutorialUltralytics recently released the YOLOv8 family of object detection models. These models outperform the previous versions of YOLO models in both speed and accuracy on the COCO dataset. But what about the performance on custom datasets?

🎓 Self-Supervised Learning in Computer Vision → Self-supervised learning (SSL) is a different learning paradigm allowing machines to learn from unlabeled data. This article discusses how SSL works and how to apply it to computer vision in the medical diagnosis domain.

🎓 Understanding Deep Learning → It's a draft of the book to be published by MIT Press. The draft is almost finished, so it may not stay free for much longer.

🎓 Deploying Computer Vision Models: Tools & Best Practices → Computer vision models have become insanely sophisticated with a wide variety of use cases enhancing business effectiveness, automating critical decision systems, and so on. But a promising model can turn out to be a costly liability if the model fails to perform as expected in production. Having said that, how we develop and deploy computer vision models matters a lot!

Research Spotlight

In a recent independent paper, “ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders” researchers proposed modifying the popular ConvNext architecture. It is a co-design architecture that uses the masked autoencoder in the ConvNeXt framework to achieve results similar to those obtained using transformers. It is a step towards making mask-based self-supervised learning methods effective for ConvNeXt models.

Anomaly detection (AD), the task of distinguishing anomalies from normal data, plays a vital role in many real-world applications, such as detecting faulty products from vision sensors in manufacturing. In most scenarios, users have a limited labeling budget, and sometimes there aren’t even any labeled samples during training. To address this challenge Google recently published 2 papers on Anomaly Detection. Using data-centric approaches, researchers show state-of-the-art results in both unsupervised and semi-supervised settings.

Reversible Vision Transformers are a memory-efficient architecture design for visual recognition that can reduce GPU memory requirements by up to 15.5x while maintaining model complexity, parameters and accuracy. Throughput can increase up to 2.3x compared to non-reversible models, and full code and trained models are available online.

CV Developer’s Corner

🧰  applied-ml → Figuring out how to implement your ML project? This repository features curated papers, articles, and blogs on data science & machine learning in production.

🧰  google-vizier → OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stopping, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability

🧰  An Intro to Computer Vision with Hugging Face Transformers → Recording of Julien Simon’s talk during a recent Computer Vision Meetup. He gave a great overview of what it means to work with computer vision models, especially those hosted on the Hugging Face Hub.

Upcoming Events

Life vs. ImageNet: What I wish I had known before deploying computer vision to the real world. → I mean the name of the webinar says it all. This will be a cozy get-together with ML developers who will share their journeys, struggles, and insights from deploying computer vision applications to the real world.

Event web banner

Miscellaneous

twitter post

What do you think? Hit me up on Twitter.

Hope your week's off to a good start!

Share Ground Truth

If you enjoyed this newsletter, consider sharing it with your computer vision friends! 🥰 

Join the conversation

or to participate.