Computer Vision Newsletter #28

Shut down all the large GPU clusters 😳

Hello there! 👋 

Over the past week, conversations about AI's potential risks have taken center stage. Numerous experts have raised concerns about humanity's relentless pursuit of Artificial General Intelligence (AGI) without addressing the AI boxing problem. In essence, this problem revolves around the containment and alignment of superintelligent AI systems. Since our grasp of current models' inner workings is limited, we can't predict how they'll behave in situations beyond their training data. This uncertainty could lead to potentially disastrous consequences on a worldwide scale. 

Super quick highlits of this issue:

 🤌 Dangers of AI and the End of Human Civilization; Why MidJourney stops Free Trials

 🤓 Vision Transformers: Can a transformer paint a pretty picture?; MLOps Pipeline for Visual Search; A Deep Dive Into Instance Segmentation

 ⚙️ A Universe of Annotated 3D Objects; Prismer on huggingface; ArxivGPT; OpenFlamingo

 🔬 Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior; Robots that learn from videos of human activities and simulated interactions

 🤯 Image generation: 5 years ago vs Now 

Author Picks

Computer Vision & AI insights, news, interesting articles

🤌 Dangers of AI and the End of Human Civilization → In one corner, we have Lex Friedman, the ever-hopeful optimist, and in the other, Eliezer Yudkowsky, the pessimist, who called to “Shut down all the large GPU clusters and be willing to destroy a rogue datacenter by airstrike”. This a fascinating conversation between two contrasting takes on AGI's future and our ability to sustain alignment.

🤌 Opinion: The AI pope coat is the shape of hyperreality to come → In a world of AI generation, we need to teach people how these systems operate, how to check for doctored images, the value of sources, and basic methods of finding the closest approximation to the truth there is.

⚡️ Farewell to MidJourney Free Trials → Many jumped to the conclusion that Midjourney halted free trials because of the deepfake incidents. However, Version 5, which generates photorealistic images, was never accessible for free users. CEO David Holz says it was due to “extraordinary demand and free trial abuse.”

Learnings & Insights

Computer Vision, Generative AI & MLOps tutorials, guides, courses, case studies, etc.

ViT art

Can Transformer paint a pretty picture?

🧐 Vision Transformers: Can a transformer paint a pretty picture? → Diffusion models have taken text-to-image generation to the next level regarding quality and flexibility. But they still take some time to produce these results. This article explores a transformer model that does incredibly high-quality text-to-image generation but is also fast at the same time. :

🤓 How Computer Vision Is Changing Manufacturing in 2023 → A deep dive into computer vision in manufacturing and industrial automation: applications and use-cases, current and future challenges, and companies at the forefront.

🤓 End-To-End MLOps Pipeline for Visual Search at Brainly → a case study that provides insights into Brainly's ML applications, MLOps culture, team organization, and the technologies they use to deliver AI services to their users.

🤓 A Deep Dive Into Instance Segmentation: Best Practices and Frameworks → exploring instance segmentation and its function as a pixel-level training method for computer vision applications through image annotation.

🎧️ Redefining AI → an educational podcast that focuses on key narratives driving technical innovation and transformation. The series explores candid conversations, unspoken secrets and industry truths.

Developer’s Corner

Deep Learning tools, libraries, repositories, datasets, competitions, etc.

⚙️ DataPerf: the Leaderboard for Data → platform and community to develop competitions and leaderboards for data and data-centric AI algorithms. DataPerf features five challenges across four different application domains. These challenges aim to benchmark and improve the performance of data-centric algorithms and models.

⚙️ Prismer on huggingface → The official demo of Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of diverse, pre-trained domain experts.

⚙️ Objaverse: A Universe of Annotated 3D Objects → a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations.

⚙️ OpenFlamingo → an open-source reproduction of DeepMind's Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs).

⚙️ ArxivGPT → A Chrome plug-in that summarizes arXiv papers, provides key insights and allows asking follow-up questions.

Research Spotlight

Research papers in Computer Vision, Multimodal, and Robotics.

make it 3d image

🔬 Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior → Make-It-3D, utilizes a well-trained 2D diffusion model for 3D-aware supervision and employs a two-stage optimization process. The first stage optimizes a neural radiance field with constraints from the reference image and diffusion prior, while the second stage refines the model into textured point clouds, enhancing realism using diffusion prior and high-quality textures.

🔬 Robots that learn from videos of human activities and simulated interactions → Meta AI developed a method for robots to learn from human interactions by training an artificial visual cortex with egocentric videos. On top of that, they’ve built a way to pre-train the robot to perform long-horizon rearrangement tasks in simulation and transferred the policy zero-shot to a real Spot robot for real-world challenges.

🔬 DreamBooth3D: Subject-Driven Text-to-3D Generation → an approach that personalizes text-to-3D generative models using only 3-6 casually captured images of a subject. This approach merges the advancements of text-to-image personalization (DreamBooth) and text-to-3D generation (DreamFusion). Researchers employ a 3-stage optimization strategy that jointly leverages the 3D consistency of neural radiance fields along with the personalization capabilities of text-to-image models.


Fun or just cool things

🤯 Image generation: 5 years ago vs now:

If you like Ground Truth, share it with a computer vision friend! If you hate it, share it with an enemy. 😉

Have a great week!

Over and out,



or to participate.