Enhance classification with a text annotation framework for improved systemization in prompt-based language model evaluation