其他2026/3/27

SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks

研究论文《SlopCodeBench：评估编码代理在长期任务中的性能退化》提出了一种新的基准测试方法，用于衡量编码代理在执行长期任务时的性能下降情况。