
AI is not intelligent.
At least not yet. The large language models (colloquially, and incorrectly, referred to as artificial intelligence) that we currently have are not Max Headroom, Skynet, Sunny, Data/Lore, Terminator, or CP3O. They aren’t even WOPR/Joshua (and if you don’t know all the listed AIs, you and I can’t be friends). They are not thinking machines. They are fancy google searches. If we think of them that way we’ll likely get better use out of them. If you make your students/children understand that, they will likely rely on them for more appropriate tasks.
I’ve been playing with AI a fair bit over the last year and the results are inconsistent at best. So let me share some of my experience to save you time and make let you use these lessons to share with others.
Let me start this off saying, I don’t want to become a “prompt engineer” aka programmer. I’m uninterested in learning a programming language just to use a tool they tell me is conversational. So I communicate with the LLMs as I would a person, I do not often use the prompts the nerds tell us will make the machine do the magic. I don’t want to memorize code to make this work and I don’t. If it worked as advertised, I wouldn’t have to.
Anyway. onward.
Yesterday, I saw an article that called the SAT a “game-changer.” After I finished keekeeing, I thought about doing a hateread on Xwitter or Bluesky to warn the public about propaganda disguised as journalism but thought I’d be a modern man, so I asked AI to provide an analysis. First, I asked ChatGPT if the article was better categorized as an article or a press release and ChatGPT said it was an article. I realized the wording of my question mattered , so I asked it to provide a fisking (the technical word for hateread) of the article. That got better results, you can see the results on BlueSky. I asked the same fisking question of both ChatGPT and Deepseek and I think Deepseek produced better results.
Overall, results were solid, the bots gave decently accurate and critical analysis of the “article,” recognizing that it was largely a parroting of College Board’s talking points, though the writing was bland and neutral. LLM writing always reads as if its written by that kid who works hard not to offend the most offensive person in the room. But we can call this a pro: LLMs can give a decent essay on a topic summarized from whatever source its been trained on (aka based on whoever’s work has been stolen and loaded into the machine). Con: it does not think. It doesn’t not produce new thoughts or takes.
Here is where it gets interesting. I asked a few of the algorithms to provide me a list of other authors to read to provide better analysis of the SAT, I also asked who were the strong current voices that could provide better analysis of testing. Those results varied wildly. Co-pilot give me Nick Lehmann (good rec!), several people who don’t write about testing but were quoted by Lehmann, a few of ETS employees who wrote one article, and then 2 dead eugenicists. Deepseek gave me a list that listed Jeff Selingo first (he writes about college admission but not testing really), Joe Boeckenstedt (spot on), and then me in the “honorable mention” section, it wasn’t a bad list. ChatGPT listed Eric Hoover (again more college admissions than testing but not bad), Valerie Strauss (good one!) and me 5th!
So there you have it, the large language models are inconsistent but tantalizing. They are like Taco Bell, its tantalizing, I want it to be good, I keep going back no matter how many times I end up in the bathroom regretting it. Maybe one day it will be really good.
You can see everything I’ve tweeted about AI on Xwitter (you’ll probably need an account to read beyond just one tweet) if you click here. Lots of fun tidbits like when AI told me an HBCU founded in 1866 had its first black graduate in 1974.

This one is fun, the time when the AI detectors told me things I wrote before I knew of AI were 90% AI.
