The operation of the AI model highlighted draws parallels to the functioning of human brain, which does not engage all its parts simultaneously, thus conserving computational resources and speeding up results. The adaptable maneuvering across varied modalities, using them for navigating and understanding, is unparalleled and impressive.
If an AI is able to work across different modalities, it would mirror how humans function. Humans are inherently multimodal; switching effortlessly between speech, writing, or using images or graphics to communicate thoughts and concepts.
However, it is important to not over-analyze or take too much from these developments. There’s an old saying: “Never trust an AI demo.” This cautionary note is necessary as there’s uncertainty about what the demonstration videos might not have included or may have selectively picked from the plethora of tasks. Moreover, there is concern that the model may fail to repeat some of the shown demonstrations if the input language was minutely altered. AI models, generally speaking, can be quite fragile.
The current rollout of Gemini 1.5 Pro is restricted to developers and corporate consumers and there is no confirmed timeline yet for the public release. Google’s initial release of Gemini faced backlash for not being transparent that the demo video was accelerated.