Alex FinnMay 1, 20261h 49m

LIVE: OpenClaw vs Hermes Agent: The ultimate showdown

Summary

{ "tldr": [ "Hermes on Opus took the lead after 3 tests — By the end of the stream, Alex’s live scorecard had Hermes+Opus at 88.8, OpenClaw+Opus at 81.4, OpenClaw+ChatGPT at 70.7, and Hermes+ChatGPT far behind at 41.0.", "OpenClaw on ChatGPT was the speed demon, but often shipped ugly UI — It won the first dashboard test on speed and delivered working live stock data fast, but Alex repeatedly called the interface “total slop” and scored its UI just 2.6/10.", "Hermes on ChatGPT was wildly inconsistent — It crashed Alex’s computer on the stock dashboard, bombed the game test with something “completely unplayable,” then suddenly produced the most faithful Apple homepage clone of the day, likely by aggressively copying reference visuals.", "The biggest quality jump came from Opus on creative tasks — In the “gravity is broken” game challenge, Hermes+Opus made the only game Alex called “legitimately fun,” scoring 9.2 for functionality and 9.5 for communication.", "The test format mattered almost as much as the agents — A prompt contradiction (“use apple.com” but later saying “Stripe”) visibly confused every agent, turning the website recreation round into an accidental test of recovery, interpretation, and robustness.", "Alex’s live takeaway was that the OpenClaw vs. Hermes tribalism is overblown — Even while scoring winners and losers, he said the difference is “the most overrated thing on planet Earth,” with model choice—especially GPT vs. Opus—showing up more clearly than agent wrapper wars." ], "breakdown": "### The Agent Olympics kickoff\n\nAlex opens like it’s a title fight: OpenClaw vs. Hermes, each running on ChatGPT and Opus, five tests total, no edits, no sponsorships