- Ground Truth
- Last Week in Computer Vision #22
Last Week in Computer Vision #22
Field Guide for Working with Machine Learning Datasets; PyTorch Vision Transformers
Welcome to all the new subscribers who joined last week! Thrilled to have you with us.
You can always hit Reply to share your suggestions or feedback with me. 🤗
Let’s get down to it, shall we?
Few highlights from this issue: Field Guide for Working with Machine Learning Datasets; PyTorch Vision Transformers; Building MLOps Pipeline for Computer Vision; Algorithms for efficient deep learning; A Dive into Vision-Language Models; Token Merging - Your ViT but faster; Zero-shot Image-to-Image Translation; Pytorch resources and tutorials; What we look for in a resume for an AI startup AI at Scale Summit and much more.
Author Picks & Industry News
🤖 Google Research, 2022 & beyond: Algorithms for efficient deep learning → as deep learning models increasingly find themselves deployed in production, the efficiency and costs of these models have gone from a minor consideration to a primary constraint.
🚘️ Tesla will recall 362,000 vehicles → equipped with its “Full Self-Driving Beta” tech, after the NHTSA said the tech increases crash risk; in this case, recall = an over-the-air software update.
🎨 Roblox is working on generative AI tools → the first two tests of the tools will roll out in the coming weeks: a tool to make “generative AI materials from a text prompt” and a tool for generative AI to complete code.
🚘️ Waymo expands service in San Francisco → Waymo is getting closer to serving the entire city with this latest expansion, which includes the addition of coverage in the Mission, Dogpatch and Potrero Hill.
👁️🗨️ How Computer Vision is Changing Agriculture in 2023 → main computer vision tasks being put to use, current and future challenges, and companies at the forefront.
Always Be Learning
🤓 A Critical Field Guide for Working with Machine Learning Datasets → Machine learning datasets are powerful but unwieldy. This guide offers questions, suggestions, strategies, and resources to help people work with existing machine learning datasets at every phase of their lifecycle.
🤓 A Dive into Vision-Language Models → pre-training strategies, datasets, use cases, emerging areas of research and models available to play with in the Hugging Face ecosystem.
🤓 Building MLOps Pipeline for Computer Vision: Image Classification Task [Tutorial] → guide on building an MLOps pipeline for a computer vision task using Vision Transformer.
🤓 How To Best Manage Raw Data for Computer Vision → breaking down the first step of preparing data for custom model training, which entails acquiring raw data, methods of secure data management and covering the best practices for preprocessing; prior to starting the labeling or annotation work cycle.
🤓 The Future of Image Recognition is Here: PyTorch Vision Transformers → vision transformer architecture explained in detail + implementation of the ViT architecture in PyTorch.
🔬 Zero-shot Image-to-Image Translation → pix2pix-zero, a diffusion-based image-to-image approach that allows users to specify the edit direction on-the-fly (e.g., cat to dog). This method can directly use pre-trained text-to-image diffusion models, such as Stable Diffusion, for editing real and synthetic images while preserving the input image's structure.
🔬 Socio-Technical Anti-Patterns in Building ML-Enabled Software → qualitative empirical study of socio-technical challenges of productionizing ML models centered around and within teams. The study involved a manual analysis of 66 hours of talks that have been recorded by the MLOps community to extract insights from leaders at the forefront of production ML.
🔬 Token Merging: Your ViT but faster → Meta AI shared new research to reduce the latency of existing Vision Transformer (ViT) models without the need for additional training. The proposed approach, called Token Merging (ToMe), combines similar tokens to reduce computation without losing information. GitHub repo
🔬 Extracting Training Data from Diffusion Models → Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, researchers show that diffusion models memorize individual images from their training data and emit them at generation time.
CV Developer Tools
⚙️ the-incredible-pytorch → a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch.
⚙️ Google Colab Copilot → AI assistant that writes python code for you, github copilot implemented on Google Colab.
⚙️ DigiFace-1M → To avoid privacy problems associated with real face datasets, Microsoft introduced a large-scale synthetic dataset for face recognition, obtained by photo-realistic rendering of diverse and high-quality digital faces using a computer graphics pipeline.
AI at Scale Summit → Deep dive into scaling AI, including automatic data labeling and management, massive distributed training on multiple GPUs/TPUs, running tons of experiments simultaneously, deploying thousands of models, and running inference to millions or billions of people.
👾 FutureTools → Library with AI-powered tools categorized by use case.
👩💻 Chip Huyen: What we look for in a resume for an AI startup → an overview of the resume evaluation process from the perspective of an AI startup hiring manager, and advice on how to create a resume that stands out.
🤭 Meme Therapy
Have a great week and share Ground Truth with your computer vision friends!