Computer Vision Newsletter #40

📌 CVPR 2023: Highlights and Takeaways & 🛠️ 2023 MLOps Landscape

CVPR 2023: Emerging Trends in Computer Vision Research

mindblowing img

This is how I feel after CVPR 2023.
Image creds: generated with Midjourney by Christie C.

The CVPR conference is a standout event in the field of computer vision - it's the Olympics if you will. This year, CVPR ventured beyond the US for the first time, landing in Vancouver, Canada. Now that the dust has settled and the accolades have been distributed, let's indulge in a bit of retrospection.

Attempting to squeeze the enormity of the conference into a few paragraphs is an uphill task, but gleaning the trends from this year's CVPR is a vital endeavor in comprehending the current state and future direction of the computer vision field. So let’s dive in!

  1. Vision Transformers: Despite their recent introduction to the computer vision field in 2020, Transformers have muscled their way into the spotlight, challenging CNNs' long-standing reign. This year, we witnessed a flurry of innovative techniques orbiting vision transformers, as researchers took deep dives into bias analysis, pruning, pretraining, distilling, reverse distilling, and more.

  2. Image Generation: With the rise of GANs and diffusion models, the domain of image generation has been in a tailspin of activity. An array of imaginative works focused on refining image generation and offering users a firmer grip on the reins of these models. A lot of eyeballs were drawn to generating faces, unsurprisingly, given our inherent curiosity about faces.

  3. Foundational Models: The quest for foundational models in computer vision is revving up. This was quite evident at CVPR, with intense discussions circling around various general pre-trained computer vision models like DinoV2 and SAM. Looking ahead to next year, we can anticipate a strong push in this area with significant advancements and focus. The stage is set for the debut of some truly exciting foundational CV models.

  4. NeRF: This method has been making waves under the radar, masterfully turning 2D images into immersive 3D scenes. This year's conference witnessed a veritable explosion of NeRF-centric papers, tackling challenges such as scaling up, boosting efficiency, managing dynamic scenes, and even working with fewer images. With the spotlight now firmly on it, NeRF is certainly a technique to keep your eyes on.

  5. Multimodal Models: Interest in multimodal models has been steadily heating up. The potential of feeding both image and text tokens into a single transformer model is an enticing possibility, one that numerous research teams are keen to explore and decode. This integration of modalities could lead to some fascinating advancements in our understanding of machine learning processes.

Learning Resources

1. MLOps Landscape in 2023: Top Tools and Platforms → Caught this comprehensive guide to MLOps tooling and it’s a treasure trove! There's just one tiny, microscopic detail missing – Superb AI tools under the 'Dataset Labeling and Annotation' section. Yes, I'm looking at you, Stephen. 👀 But all jokes aside, it is an up-to-date and thoroughly detailed guide, check it out!

2. Drone Programming with Computer Vision: A Beginner’s Guide → Drones certainly stir the pot of controversy, but we can’t ignore the compelling trajectory they're setting in computer vision applications. The vital point to remember is harnessing this power for beneficial purposes. So let's tread thoughtfully, and remember, with great power comes great responsibility. 😉 

3. Beyond NeRFs: Tips and tricks for successfully using NeRFs in the wild → A deep dive into NeRFs to better understand why they perform poorly in the real world and how this problem can be solved. In particular, a few recent proposals, called NeRF-W and def-NeRF, modify NeRFs to better handle images that are captured in uncontrolled, noisy environments.

4. What is StyleGAN-T? A Deep Dive → What StyleGAN-T is, how the model came to be, how it works, where it can be applied, and how you can get started with it.

5. Computer Vision Projects With Source Code And Dataset → "Building a simple computer vision model is as easy as pie — just get some quality data and a robust training data platform," they say. Ha! If only it was that breezy. This article shares some beginner-friendly Computer Vision project ideas to ignite your journey.

Developer Resources

⚙️ Drag Your GAN → DragGan burst onto the scene with its drag-and-drop photo editing magic. The much-anticipated code release is finally here.

⚙️ CVPR 2023 Papers with code → a very comprehensive GitHub repo listing all the CVPR 2023 papers that released the code.

⚙️ NNViz → a Python package to visualize neural networks in a comprehensible way.

Research Spotlight [CVPR 2023 Winners]

🏆️ Best Paper: Visual Programming: Compositional Visual Reasoning without Training → The paper proposes a comprehensive framework called Unified Autonomous Driving (UniAD) that integrates perception, prediction, and planning tasks into one network, prioritizing them to contribute to the planning process in autonomous driving systems. Check out the Author's Q&A.

🏆️ Best Paper: Planning-oriented Autonomous Drivinga neuro-symbolic approach that uses large language models to generate Python-like modular programs for complex visual tasks, eliminating the need for task-specific training. Check out the Author’s Q&A.

🏆️ Best Paper Honorable Mention: DynIBaR: Neural Dynamic Image-Based Rendering  The paper introduces DynIBaR, a neural dynamic image-based rendering system that synthesizes novel views from monocular videos of complex dynamic scenes.

🏆️ Best Student Paper: 3D Registration with Maximal Cliques → The paper introduces a 3D point cloud registration method called MAC (Maximal Cliques), which utilizes maximal cliques in a compatibility graph to generate accurate pose hypotheses and improve registration accuracy.

🏆️ Best Student Paper Honorable Mention: DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation  The paper introduces a novel approach for personalizing text-to-image diffusion models, allowing them to synthesize novel photorealistic images of a specific subject in various scenes, poses, views, and lighting conditions by fine-tuning the model with a unique identifier.

Previous Issue’s 3 Most Clicked Links

Drop me a line if you have any feedback or questions.

Sending you good vibes,

Dasha 🫶 

Join the conversation

or to participate.