Pragmatic Failure in Agentic LLM Systems: An Analysis Through Gricean Maxims and Speech Act Theory
DOI:
https://doi.org/10.69760/gsrh.0260302009Keywords:
pragmatic failure, conversational implicature, speech act theory, agentic AI, large language models, Gricean maxims, human-AI interaction, agent evaluationAbstract
As autonomous AI agents are deployed across consequential domains — healthcare, legal assistance, financial planning, and software engineering — their communicative competence has emerged as a critical yet underexamined dimension of reliability. The present article investigates pragmatic failure in agentic large language model (LLM) systems: situations in which agents violate Gricean conversational maxims or misinterpret the illocutionary force of user utterances during multi-step task execution. Drawing on Grice’s (1975) Cooperative Principle and Searle’s (1969, 1975) Speech Act Theory, we develop a two-level annotation scheme for characterising pragmatic failure in agentic interaction logs and a typology of the failure modes that the scheme is designed to capture. We situate this framework within current debates on LLM pragmatic competence and demonstrate its application through illustrative analysis of interaction patterns documented in the existing literature. We further outline a planned empirical study applying the scheme to publicly available agentic benchmark data. The framework produces three practical outputs: a typology of pragmatic failure modes in agents, a reusable annotation scheme with operational coding criteria, and design recommendations for pragmatically-aware agent evaluation.
References
Austin, J. L. (1962). How to do things with words. Harvard University Press.
Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press.
Fang, H., Zhu, X., & Gurevych, I. (2024). InferAct: Inferring safe actions for LLM-based agents through preemptive evaluation and human feedback. arXiv:2407.11843.
Flowerdew, J., & Costley, T. (Eds.). (2024). The rise of large language models: Challenges for critical discourse studies. Discourse, Context & Media, 62. https://doi.org/10.1080/17405904.2024.2373733
Frontiers in Education. (2025). How inclusive can large language models be? The curious case of pragmatics. Frontiers in Education, 10. https://doi.org/10.3389/feduc.2025.1619662
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 41–58). Academic Press.
Grice, H. P. (1989). Studies in the way of words. Harvard University Press.
Levy, N., et al. (2025). Gricean maxims in LLM development. NeurIPS 2025 Workshop: Evaluating the Evolving LLM Lifecycle, San Diego. https://neurips.cc/virtual/2025/loc/san-diego/133782
Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicature. MIT Press.
Li, X., et al. (2024). [Cited in: Mahowald, K., et al. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011]
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Huang, M., Dong, Y., & Tang, J. (2023). AgentBench: Evaluating LLMs as agents. arXiv:2308.03688.
Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011
Opitz, J., Wein, S., & Schneider, N. (2025). Natural language processing relies on linguistics. Computational Linguistics, 1–24.
Orsini, F., & Brunato, D. (2025). INDIR-IT: A benchmark for evaluating indirect speech acts in Italian LLMs. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025).
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Panfili, L., Duman, S., Nave, A., Ridgeway, K. P., Eversole, N., & Sarikaya, R. (2021). Human-AI interactions through a Gricean lens. arXiv:2106.09140.
Qi, Y., Zhang, R., Shi, Z., Zhu, Z., Xue, S., Zhang, X., Long, C., Yin, P., Dou, L., & Lin, Y. (2025). AgentIF: Benchmarking instruction following of large language models in agentic scenarios. arXiv:2505.16944.
Roig, J. V. (2025). How do LLMs fail in agentic scenarios? A qualitative analysis of success and failure scenarios of various LLMs in agentic simulations. arXiv:2512.07497.
Ruis, L., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2023). The Goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by LLMs. Advances in Neural Information Processing Systems, 36, 20827–20905.
Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
Searle, J. R. (1975). Indirect speech acts. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 59–82). Academic Press.
Sieker, J., Shi, S., Koller, A., & Schlangen, D. (2023). Towards an analysis of discourse and interactional pragmatic reasoning capabilities of large language models. arXiv:2408.03074.
TELUS Digital. (2025, October 9). Agentic AI: Evolution & evaluation for real-world readiness. https://www.telusdigital.com/insights/data-and-ai/article/agentic-ai-evaluation
Wang, F., Li, X., Gur, I., Kil, T., Xu, L., Hejna, D., Zhu, H., Jain, D., Hu, T., Zheng, C., Bisk, Y., Xu, D., Shi, F., Yu, T., Chen, L., Xu, R., Wu, Z., & Ahmad, W. U. (2024). TheAgentCompany: Benchmarking LLM agents on consequential real-world tasks. arXiv:2412.14161.
Yu, D., Li, L., Su, H., & Fuoli, M. (2024). Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies. International Journal of Corpus Linguistics. https://doi.org/10.1075/ijcl.23087.yu
Zhang, Y., Shi, Y., Yu, X., Su, L., He, J., Zhang, Q., & Wen, J. R. (2025). Are your agents upward deceivers? arXiv:2512.04864.
Zheng, Y., Wang, J., Bao, J., Zhang, Y., & Wen, J. R. (2021). The GRICE dataset: Evaluating conversational implicature in language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL–IJCNLP 2021).
Zhou, S., Xu, F. F., Zhu, H., Zhou, X., Lo, R., Sridhar, P., Cheng, X., Bisk, Y., Fried, D., Alon, U., & Neubig, G. (2024). WebArena: A realistic web environment for building autonomous agents. Proceedings of ICLR 2024.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
This journal publishes all articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Authors retain copyright of their work. Anyone may freely share, copy, distribute, adapt, and build upon the published material for any purpose, including commercial use, provided that appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated. License URL: https://creativecommons.org/licenses/by/4.0/