Learnings from Automating Legacy Code Cleanup
Here are some of my learnings using large language models to help automate cleanup on a large legacy code project:
-
Prompts must be very specific to the issue type being addressed and need to be carefully crafted. This is still a 1-to-n operation as the same issue type might appear hundreds or thousands of times in the code.
-
Avoid any extraneous noise in the prompt, instructions, and context. Context (whether to add full code, parent code, dependencies) is issue type-specific.
-
Use established solutions for solved problems, e.g. ctags for code tree traversal and diff/patch for integrating changes.
-
Model selection is important. Bigger isn’t necessarily better. Larger models can be more arbitrary and cost significantly more to run (whether you pay for it or the planet does).
-
Multi-agent systems aren’t magic. Applied thoughtfully they might act as a guard against outlier variances but each interaction needs to be thoughtfully crafted and wont make quality out of nothing.