Computer Vision Newsletter #39

📹️ Video Understanding: Past, Present & Future and Battle of Multimodal Models 🥷

Hello, Truth-Seekers!

I need your help to decide smth…🤓 

I've been toying with the idea of diversifying Ground Truth content beyond the weekly Computer Vision Newsletter for quite a while, but somehow always find myself swamped. I think the time is ripe, but I'd like your opinion first. This way, I can prioritize based on what you find most valuable. Here are some content formats that I find intriguing and would enjoy exploring:

  • Computer Vision Tips: This would be a series of focused, actionable advice or tutorials on various aspects of computer vision development.

  • GitHub Repository: I've shared numerous computer vision learning resources and tools in the past year. I am considering creating a dedicated GitHub repo where you can access these resources and contribute as well.

  • Fireside Chats: Organising conversations with leaders and practitioners in the field of computer vision, where we delve into the nitty-gritty of CV.

Author Pick[s]  

Video understanding technology has come a long way in the past decade. From low-level perception tasks like object detection to high-level understanding tasks such as search, question answering, and captioning nowadays. This comprehensive article investigates the topic by reviewing existing video understanding research, what potential remains untapped, and where it is headed in the future.

How good current multimodal models actually are? Jacob Marks puts them to test focusing on three questions:

  • Which text-to-image model outperforms the rest?

  • Which image-to-text model takes the lead?

  • And, most crucially, which direction of translation—image-to-text or text-to-image—is more vital?

Tutorials & Learning

1. [Course] Advanced Computer Vision → This is an excellent course on advanced computer vision. Covers recent developments in computer vision, topics like image generation and vision-language learning.

2. [Tutorial] Create High-Quality Computer Vision Applications with Superb AI Suite and NVIDIA TAO Toolkit → This potent combination significantly accelerates your computer vision application development times without sacrificing quality.

3. [MLOps] Continuous delivery and automation pipelines in machine learning → This timeless gem from Google offers an in-depth look at methods for integrating and automating Continuous Integration (CI), Continuous Delivery (CD), and Continuous Training (CT) for machine learning systems.

4. [Tutorial] Detecting Cancer Growth Using AI and Computer Vision → a framework leveraging state-of-art CNNs and computer vision technologies to aid the detection of metastases in lymph nodes.

5. [Analyses] IoU Loss Functions for Faster & More Accurate Object Detection → Generally, object detection needs two loss functions, one for object classification and the other for bounding box regression. This article dives into the bounding box regression loss functions.

Developer Resources

1. A Machine Learning Engineer’s Guide To The AI Act → the ‘day-to-day’ of machine learning practitioners is about to go through a radical shift to ensure AI use cases are properly documented, reviewed, and monitored.

2. [CVPR 2023] H-DETR → an official implementation of paper "DETRs with Hybrid Matching" paper.

3. Seal: Segment Any Point Cloud → is a new system that uses advanced visual recognition models to better interpret various types of vehicle sensor data, making it more efficient and adaptable.

4. FLAIR-one → a large dataset ( >20 billion pixels) of aerial imagery, topographic information and land cover (buildings, water, forest, agriculture...) annotations.

Research Spotlight

1. Tracking Everything Everywhere All at Once → The paper introduces a novel test-time optimization method called OmniMotion, which enables accurate and globally consistent estimation of dense and long-range motion in video sequences, overcoming challenges such as occlusions and maintaining global motion trajectory consistency.

2. A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks → A comprehensive survey of the applications of transformers in various domains such as natural language processing, computer vision, audio and speech processing.

3. CVPR 2023 resources to check:

Previous Issue’s 3 Most Clicked Links

Drop me a line if you have any feedback or questions.

Sending you good vibes,

Dasha 🫶 

Join the conversation

or to participate.