A really interesting long form interview with @simonwillison.net. If you follow him closely most of it is probably not new, but I found some interesting nuggets.
Simon is writing most of his code from his phone these days using anthropic hosted platform. He mentioned that a lot of security risks go away when you don't put secrets on the platform and you let them take the risk of running ai written code with ai chosen supply chain.
He talked about the Pelican Riding a Bike benchmark for quite awhile. He was surprised at how well of a proxy it is for how capable a model is at just about everything. He also said that when he runs the benchmark he also runs half a dozen others that he's never talked about so that He could see if they were to train a model specific to his benchmark he could catch them, but it seems they had caught on and if they were they seem that they would already be doing it on all of his others anyways.
TDD is incredibly boring for humans, it strips so much creativity and joy from the process. Who cares if agents are bored they do better when doing TDD.