Building with Vision: Gemini 2.5, Claude 3.7 thinking, Deepseek V3, GPT 4.5
Added 2025-04-03 15:15:23 +0000 UTCThis is a little contest to see how well each of these state of the art models can use their vision commands to build and then see and evaluate what they've built. they don't do very well
Comments
it only uses vision periodically when it calls the !lookAtPlayer command, etc. Its not constantly seeing an image. And yeah, positioning is extremely hard. this video basically shows that vision is not very helpful. will release full vid with commentary on it
Max Robinson
2025-04-03 22:56:40 +0000 UTCI only started playing a bit with vision, but watching the AI view on port 3000, it didn’t seem to care too much about its actual vision. The views were pretty bad. It was still relying on memory too maybe? It’s like the bots need some prompting to understand HOW to position themselves to get good views of different things maybe.
Chris
2025-04-03 18:12:15 +0000 UTCWill do more of that in the future
Max Robinson
2025-04-03 15:32:04 +0000 UTCAs much as I love cool builds, it would be so cool to see you experiment with their social interactions, too!!
SunderingAlex
2025-04-03 15:22:38 +0000 UTC