
The past few years have witnessed an unprecedented explosion in Artificial Intelligence, driven first by Large Language Models (LLMs) like GPT-3/4, then by Vision-Language Models (VLMs) such as GPT-4V, LLaVA, and Gemini. These breakthroughs have allowed AI to understand and generate human-like text and to interpret visual information with remarkable accuracy.




