Can LLMs do accounting?
Despite promising results on synthetic benchmarks (e.g. Vending-Bench, SpreadsheetBench, DSBench), frontier models consistently underperform once they are deployed in complex, real-world situations.
Despite promising results on synthetic benchmarks (e.g. Vending-Bench, SpreadsheetBench, DSBench), frontier models consistently underperform once they are deployed in complex, real-world situations.
Of all the current debates around AI, one critique has stayed with me: that it’s “flattening the bar.” Tools like ChatGPT, the argument goes, make everyone’s writing sound the same — generic, overly polished, stripped of nuance. The concern is real, and I share it.
Now we’re in the agentic era, and that billing octopus grew some new tentacles just for AI agent billing. Or is it a different octopus? I’m not sure.
Schopenhauer’s philosophy is the mirror of his own nature… What he saw was not the world, but himself writ large.” Nietzsche
The Model Context Protocol (MCP) is an open standard. It acts like a universal connector for AI applications that will allow them to communicate with external data sources or other tools. So instead of building custom integrations for each of these data sources or tools, MCP provides a standardized way for AI models to access the information they need to provide better and more relevant responses.
For nearly three years, Arc from The Browser Company has been my daily driver. To be sure, there was a little bit of a learning curve. Tabs disappeared after a day unless you pinned them. Then they became almost like bookmarks. Tabs were on the left side of the window, not at the top. Spaces let me organize my tabs based on use cases like personal, work, or finances. I…
When we hype up the technology, we mostly help the people who put money into it. This post isn’t about those people or that money, maybe they could use the help… my point is, they are irrelevant when we want to understand the merits of AI. They muddy the waters and overshadow the important questions.
OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you
Companies developing video AI models and tools often talk about working with Hollywood studios to make certain workflows possible. On Thursday, Netflix said that it has started using AI in movies and shows it produces.
During a recent discovery project I led on car finance, I interviewed a participant who, based on the screener, appeared to meet the eligibility criteria.