Computer Vision Newsletter #35

GANs Resurge, Robots Emerge & The Future of Data

AUTHOR PICKS

Generative Diffusion Models: Explain to me like I am 35 → We live in fascinating times for Computer Vision, a field no longer confined to benchmarks and citations but also influenced by popularity on Reddit and Youtube. The broad use of these tools signifies a shift; the appeal is not just their open-source nature, but their practicality, usability, and accessibility to all. This blog aims to decode the complexities of Computer Vision, prioritizing intuition over complex math, to cater to a wider audience eager for insights beyond those offered by mainstream media.

5 Founders on the Future of DataAI's unrelenting progress is well known, yet the pivotal role of data and infrastructure in these leaps tends to go unnoticed. These elements, ranging from diversifying data sources for superior models, creating a robust infrastructure for AI tasks, or leveraging potent hardware for innovative applications, are the bedrock of AI advancement. Meanwhile, in the AI frenzy, the enduring importance and innovation in conventional data analysis in the enterprise realm are often forgotten. Founders and industry leaders share their insights on the future of data and their experiences building solutions for a wide array of use cases.

LEARNING & INSIGHTS

1. 5 Tips for Creating Lightweight Vision Transformers → few tips, tested experimentally on different datasets, about how you can increase your model's accuracy without wasting resources in places you don't have to.

2. How to Train YOLO-NAS on a Custom Dataset → YOLO-NAS is a new state-of-the-art object detection model developed by Deci. This guide discusses what YOLO-NAS is and how to train a YOLO-NAS model on a custom dataset.

3. MLOps - A Comprehensive Guide → this repository lays out the essentials of MLOps and the reasons it's critical in today's tech landscape. It's loaded with valuable resources including recommended courses, books, papers, notable tools and active communities in the field of MLOps.

4. How to Build an End-To-End ML Pipeline → building efficient end-to-end machine learning pipelines is key for modern ML engineers. To ace this, focus on rigorous testing, monitoring performance, automating tasks, and effective scheduling. This blend ensures a sturdy and effective pipeline.

5. Techniques for Speeding Up Model Training → the latest unit of the Deep Learning Fundamentals course. This new addition unpacks various cutting-edge techniques designed to rev up your deep-learning training sessions.

6. GANstruction: Building your own Generative Adversarial Network → Sairam Sundaresan offers a deep dive into the inner workings of GANs, concentrating on the code, mathematical equations, and the captivating evolution of this technology.

6.1 I've got to take a moment to point you toward Sairam's Gradient Ascent substack. With all the noise and hustle of the AI boom, finding truly valuable content can be a real chore. That's where Sairam's work stands out - his knack for taking intricate computer vision and machine learning topics and turning them into engaging reads is something I couldn't help but appreciate. He's a seasoned research scientist with over a dozen years of industry experience under his belt, and believe me, it shows. I wholeheartedly recommend giving his substack a subscribe.

RESEARCH SPOTLIGHT 

draggan

GANs have been somewhat on the down-low recently, but with DragGan, we might be witnessing their spectacular comeback. DragGan, a method for interactive point-based manipulation on the generative image manifold, offers unprecedented control, allowing you to manipulate the pose, shape, and expression of models and objects in an image with pinpoint precision. But remember, with great power comes great responsibility! 😉 The website's still a work in progress, so while we're waiting to test it out ourselves, you can dive into the nitty-gritty details in their research paper.

2. Materialistic: Selecting Similar Materials in Images → The paper presents a method called "Materialistic" for selecting regions in an image that exhibits the same material as a user-provided area. The approach is robust to various image properties and does not rely on semantic segmentation, instead using similarity-based grouping with unsupervised DINO features and a Cross-Similarity module. This method could assist with robotic scene understanding, image editing, or online recommendation systems.

3. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention → The authors introduce a new series of high-speed vision transformers designed to manage the computational burden of existing models. They use a sandwich-like structure which enhances memory efficiency and channel communication. This is achieved by positioning a memory-bound multi-head self-attention (MHSA) layer between efficient feed-forward network (FFN) layers. They also incorporate a cascaded group attention module to minimize computational redundancy and boost attention diversity.

DEVELOPER’s CORNER

1. StableStudio → Stability AI announced the open-source release of their premiere text-to-image consumer application DreamStudio.

2. Datalab: A Linter for ML Datasets → an audit for your dataset and labels that automatically detects common real-world issues like label errors, outliers, (near) duplicates, low-quality/ambiguous examples, or non-IID sampling.

3. AI-powered coding, free of charge with Colab → Colab will soon add AI coding features like code completions, natural language to code generation and even a code-assisting chatbot.

NEWSy bits

1. Meta is building its own AI chip → Meta shared recent progress on an ambitious plan to build the next generation of our infrastructure backbone — specifically created for AI. This includes a first-generation custom silicon chip for running AI models, a new AI-optimized data center design and the second phase of our 16,000 GPU supercomputer for AI research.

2. Phoenix, a general-purpose humanoid robot → the world’s first humanoid general-purpose robot by Sanctuary AI. The AI system that powers Phoenix is designed to give Phoenix human-like intelligence and enable it to do a wide range of work.

robot

3. OpenAI-backed robot startup deploys AI-enabled robots in real-worldAccording to the CEO of 1X, a company backed by OpenAI, a robot capable of performing nursing and bartending tasks while using its human-like arms is currently operational in the United States and select parts of Europe.

4. Jizai Arms – AI Robotic Arms That Turns You Into Spider-Man → Imagine being able to control six robotic arms attached to your back, giving you the power to multitask like never before. Japanese robotics company, Exiii Inc., has created Jizai Arms, a backpack that can turn you into Spider-Man.

robot arms

Who can relate? 😅 

meme

Drop me a line if you have any feedback or questions.

Sending you good vibes,

Dasha 🫶 

Join the conversation

or to participate.