Ground Truth
Posts
Computer Vision Newsletter #30

Computer Vision Newsletter #30

Recycling robots & autonomously walking robots 🤖 Animated doodles from Meta AI 🎨 Stable Diffusion XL

Dasha Gurova
April 18, 2023

Hello, Truth-Seekers! 👋

Buckle up! In this issue, we've got recycling robots and autonomously walking robots, the revamped Stable Diffusion model, animated doodles from Meta AI, open-sourced Consistency Models from OpenAI, a YOLOv8 tutorial, and a Deep Multi-task & Meta-Learning course. Also, Elon Musk, who advocated for a pause in AI research, unveils his work on "TruthGPT."

Would you sign? ⤵️ 😆

propose a 6 day pause in ai research so I can take a week off
— AK (@_akhaliq)
4:32 PM • Apr 5, 2023

AUTHOR PICKS 👩‍💻

[Computer Vision & AI industry news, trends, insights]

Robot recycling in Google Offices

Sorting waste and recyclables with a fleet of robots

Recycling can be challenging due to the numerous rules involved. But Google created robots that can effectively sort waste in real offices, reducing contamination rates by 50%! Just look how these bots run around, getting recycling right. 🥹 Also here is the Google blog.

Elon Musk claims to be working on ‘TruthGPT’

Elon Musk, who recently advocated for a pause in AI research, unveiled plans for "TruthGPT," an AI alternative to ChatGPT that's focused on truth-seeking. He believes an AI aimed at understanding the universe might be less harmful to humans. Though his new AI company, X.AI, was established in March, it's still uncertain how far along TruthGPT's development has come.

Computer Vision & Medical Imaging News

This is a monthly publication that covers the latest advancements and news in the computer vision field with a focus on the medical industry. It offers a mix of technical articles, interviews with industry experts, and updates on innovative projects and technologies. Check out April 2023 issue of the magazine.

LEARNING 🤓

[Tutorials,deep-dive guides, courses, books, etc]

A Recipe for Training Large Models → Boris Dayma (creator of dalle-mini) wrote this awesome guide to: help practitioners train large models (>1B parameters), avoid instabilities and save experiments that started to fail without restarting at 0.

Damaged Car Parts Detection using YOLOv8 → an in-depth walk-through of the whole process of implementing the task: from data collection to deployment with FastAPI and a custom JS frontend, as well as other options like Streamlit.

Stanford CS330: Deep Multi-Task and Meta-Learning → This course dives into multi-task learning, exploring how to use the connections between Computer Vision, NLP, and speech recognition tasks for more efficient and effective learning.

ML Model Packaging [The Ultimate Guide] → key concepts, challenges, and best practices for ML model packaging, including the different types of packaging formats, techniques, and frameworks.

RESEARCH SPOTLIGHT 🔬

[Research papers that caught my 👀]

A robot that walks with complete automation → Researchers proposed a neural volumetric memory (NVM) architecture that accounts for the 3D world's SE(3) equivariance. NVM, a volumetric format, aggregates feature volumes from multiple camera views and adjusts them to the robot's ego-centric frame.

Convert Doodles into Animation → Meta AI open-sourced Animated Drawings project, featuring a dataset of nearly 180,000 annotated amateur drawings. It is now available for AI researchers and creators to explore and innovate. Meta has released this dataset and animation code, emphasizing it's the "first annotated dataset" with such a diverse array of artwork. Also, check out the project’s home.

Automatic Gradient Descent: Deep Learning without Hyperparameters → The paper proposes a new optimization framework, called Automatic Gradient Descent, which explicitly leverages the neural architecture of deep neural networks to train them without the need for hyperparameters, and provides a PyTorch implementation that can handle both fully-connected and convolutional networks at ImageNet scale.

TOOLS & DATASETS 🛠️

[GitHub repos, tools, datasets]

Grounded-Segment-Anything

Grounded-Segment-Anything → is an intriguing project that merges Segment-Anything with three powerful zero-shot models (Grounding DINO, Stable Diffusion, BLIP) to build a pipeline for solving complex problems.

Open AI Consistency Models for Generative AI Art → Following the release of their paper on 'Consistency Models' last month, Open AI open-sourced the tech last week. This technique might become a significant advancement in AI art generation, potentially setting DALL-E apart from others in the field.

Stable Diffusion XL → Stability AI unveiled its new model, 'SDXL' (in beta). It is stepping up to challenge Midjourney in AI-generated images, with improved graphics and features like image prompting, reconstruction, and extension. SDXL is open-source and you can play with it on DreamBooth or ClipDrop.

Miscellaneous 💬

Image Matching Challenge 2023 [from Google Research] → Reconstruct 3D scenes from 2D images.
[Webinar] Building Data-Centric Workflows for Computer Vision Applications with Superb AI.
Forbes AI 50 → annual list recognizing the most promising companies building businesses out of artificial intelligence.
35 Ways Real People Are Using A.I. Right Now.

If you like Ground Truth, share it with a computer vision friend! If you hate it, share it with an enemy. 😉

Have a great week!

Over and out,

Dasha

Reply

or to participate.