- Ground Truth
- Posts
- Computer Vision Newsletter #34
Computer Vision Newsletter #34
How to get Computer Vision job; Image Bind - multimodal learning with six different modalities; Stable Animation SDK
AUTHOR PICKS
1. [Ground Truth Q&A] Essential Skills and Strategies for Landing a Computer Vision Job → Navigating the field of computer vision can indeed be daunting, given its considerable breadth and depth. Nonetheless, experienced professionals frequently share a few common tips. After consulting with the ML team at Superb AI, I've compiled a response to help guide you on your journey.
2. Text-to-Image Diffusion Models Part I → Discover the inner workings of Diffusion models and how they generate stunning images through this engaging and informative article that's as enjoyable as it is educational.
3. The importance of Open Source AI and the challenges of liberating data → Modern AI thrives on hardware, knowledge, and especially, vast datasets. But legal and technical hurdles around data are a real bottleneck for the Open Source community. Ironically, existing copyright restrictions don't deter big entities from monopolizing data, yet they stifle the Open Source ethos.
LEARNING & INSIGHTS
1. [Free Book] Computer Vision: Algorithms and Applications → The 2nd edition of this book by Richard Szeliski is nothing short of a gold mine, distilling his 40 years of Computer Vision research into practical, applicable techniques.
2. The Future of Embeddings for Computer Vision Data Curation → this blog explores the current state of embeddings in data curation for computer vision; the challenges and potential pitfalls that lie ahead, and some predictions on how embeddings will continue to redefine the way data is curated.
3. YOLO-NAS: New YOLO Object Detection Model Beats YOLOv6 & YOLOv8 → exploring the latest installment of YOLO models, i.e., YOLO-NAS. Key architectural insights, training details, performance and how to use YOLO-NAS for inference.
4. Building a Real-Time Object Detection and Tracking App with YOLOv8 and Streamlit → a 3-part tutorial that walks you through building a real-time object detection and tracking application with YOLOv8 and Streamlit. It's a straightforward and efficient approach that you can easily tweak and merge into your projects.
5. Segment-Anything-Model-Tutorial → a detailed Kaggle Notebook that walks you through how to use the Segment Anything model for semantic segmentation tasks.
RESEARCH SPOTLIGHT
1. ImageBind: Holistic AI learning across six modalities → Meta AI is back in the spotlight with its latest open-source research contribution - ImageBind. The approach explores learning a unified embedding across six distinct modalities - images, text, audio, depth, thermal, and IMU data, all linked by their natural association with images. ImageBind enables cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation, and achieves state-of-the-art performance on emergent zero-shot recognition tasks and few-shot recognition. For now, ImageBind is purely a research project but it holds enormous promise for multimedia search, virtual reality, and robotics in the future.
2. DataComp: In search of the next generation of multimodal datasets → Datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, this paper introduces DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets.
3. A data augmentation perspective on diffusion models and retrieval → The paper examines the effectiveness of diffusion models in generating images for data augmentation in downstream tasks, such as classification, and compares different methods of utilizing these generative models, finding that personalizing diffusion models for the target data and using nearest neighbor retrieval from the training data of the diffusion model lead to improved performance.
4. Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era → a comprehensive survey of the fast-developing field of text-to-3D, exploring the interaction between generative AI and 3D modeling technologies, discussing foundational techniques, and examining various applications of text-to-3D.
DEVELOPER’s CORNER
1. Stable Animation SDK → Stability AI released a text-to-animation tool for developers. Users can create animations in various ways: through prompts (without images), a source image, or a source video.
2. EVA AI-Relational Database System → is designed for supporting database applications that operate on both structured and unstructured data using deep learning models. It comes with a wide range of models for analyzing unstructured data, including models for object detection, question answering, OCR, text sentiment classification, face detection, etc.
3. CV Evaluations → is a framework for evaluating the results of computer vision models.
4. DetGPT: Detect What You Need via Reasoning → a novel object detector that is able to perform reasoning under complex natural language instructions provided by a user.
NEWSy bits
1. AMP Robotics, a Colorado-based startup, just bagged $99M. They're creating robotic systems powered by computer vision to sort recyclable material. Amid the buzz of Generative AI, it's heartening to see CV tech for a noble cause snagging major funds.
2. OpenAI CEO Sam Altman urged lawmakers to regulate artificial intelligence during a Senate panel hearing Tuesday, describing the technology’s current boom as a potential “printing press moment” but one that required safeguards.
3. The EU dropped a 144-page amendment to its AI Act that aims to tighten restrictions on American AI companies → the big update would ban anyone from making AI models accessible in Europe unless they pass through “licensing”.
4. As AI-generated fakes proliferate, Google plans to fight back → As deep fakes and other manipulated content become more sophisticated, the tech giant is developing new tools to identify and flag this content.
Drop me a line if you have any feedback or questions.
Sending you good vibes,
Dasha 🫶 Computer Vision Newsletter
Reply