Last Week in Computer Vision #23

Data-Centric AI is making its way into the classroom

Hello there! Ground Truth issue 23 is here!

If you have any feedback, suggestions, or thoughts about it, please don't hesitate to hit reply and let me know what your ideas are. I'm all ears!

If you like Ground Truth, share it with a computer vision friend! If you hate it, share it with an enemy. 😉 

Quick Highlights: Data-Centric AI course is now thought at MIT and is publicly available | Which GPU(s) to Get for Deep Learning | Mastering DALL·E 2 | Vision Transformers: From Idea to Applications | Computer Vision Tools | pix2pix3D: 3D-aware Conditional Image Synthesis | ATCON: Attention Consistency for Vision Models and more.

flawed_data

If you type “Data-Centric AI“ in google search, you will be presented with numerous blogs usually titled “Model-Centric vs Data-Centric“. There is so much hype that it made me weary and I started to be skeptical about any blog on the topic. This is not to say that I consider the Data-Centric AI (DCAI) approach useless. On the contrary, I think it makes more sense to iteratively check and improve dataset quality rather than treat it as a static entity. In real-world ML applications, improving datasets is not just good practice but is often the best way to improve model performance, and make it safer. The problem is that it has been mostly done in an ad hoc manner, guided by intuition and experience.

Data-Centric AI is about standardizing technics and tools to improve ML datasets iteratively and effectively. This, above all, will help ML engineers preserve their sanity and allow them to ship robust and safe computer vision models in production.

🤓 The first-ever “Introduction to Data-Centric AI” course is now taught at MIT and all the course materials are freely available to the public! So if you were looking for a high-quality and practical resource on DCAI, check it out!

Contrary to academia, the dataset is not fixed in real-world applications! Real-world data tends to be highly messy and plagued with issues, such that improving the dataset is a prerequisite for producing an accurate model. Seasoned data scientists know it is more worthwhile to invest in exploring and fixing the data than tinkering with models, but this process can be cumbersome for large datasets.

Curtis Northcutt, co-lecturer of DCAI course

Always Be Learning

🤓 Vision Transformers: From Idea to Applications → the ViT architecture explained, an overview of the performance, and notable extensions that can help to improve ViT.

🤓 Which GPU(s) to Get for Deep Learning → this article answers a few key questions: what features are important if you want to buy a new GPU? GPU RAM, cores, tensor cores, caches? How to make a cost-efficient choice?

🤓 Mastering DALL·E 2: A Breakthrough in AI Art Generation → deep dive into the intricacies of Dalle2, its capabilities, and its potential applications in various industries. The article explores the underlying technology and how to use it to generate stunning images.

Featured Events

datacentric_banner

📆 Build Data-Centric Workflows for Computer Vision Applications → how to improve your Computer Vision applications by adopting and implementing Data-Centric workflows. Book your spot here.

📆 Build and Monitor Computer Vision Models with TensorFlow/Keras + WhyLabs → How to build and perform ML monitoring with computer vision classification models in production. Book your spot here.

Helpful Tools

⚙️ Animated AI → animations and instructional videos about neural networks.

⚙️ LandingLens → Landing AI released their computer vision tool for anyone to use to label images, train a model, and deploy model to production in minutes.

⚙️ DeepSpeed → is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference.

⚙️ Albumentations → a Python library for image augmentation. Albumentations supports all common computer vision tasks, provides a simple unified API to work with all data types and contains more than 70 different augmentations to generate new training samples from the existing data.

Research Spotlight

🔬  ATCON: Attention Consistency for Vision Models → Stanford researchers proposed an unsupervised fine-tuning method that enforces consistency between attention maps computed using different methods, resulting in improved representations learned by the model and increased classification performance on unseen data. Paper | Github | Video.

🔬 3D-aware Conditional Image Synthesis (pix2pix3D) → a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, the model synthesizes a photo from different viewpoints. we integrate 3D representations with conditional generative modeling, i.e., enabling controllable high-resolution 3D-aware rendering by conditioning on user inputs. Paper | GitHub

🔬 TRICD: Testing Robust Image Understanding Through Contextual Phrase Detection → Current SOTA visual question answering (VQA) systems display impressive performance on popular benchmarks, however often fail when evaluated on challenging images. Researchers at NYU propose a novel task titled 'Contextual Phrase Detection' which evaluates models' fine-grained vision and language understanding capabilities along with a human-labeled dataset (TRICD). In addition to assessing whether the objects are visible in the scene, models are required to produce bounding boxes to localize them. Paper | GitHub.

Meme Therapy

Have a great week! 🤗 

Over and Out,

Dasha

Join the conversation

or to participate.