Pragmatic Failure in Agentic LLM Systems: An Analysis Through Gricean Maxims and Speech Act Theory

Zarifa Sadigzada

doi:10.69760/gsrh.0260302009

Authors

Zarifa Sadigzada Nakhchivan State University Author https://orcid.org/0009-0007-1179-1214

DOI:

https://doi.org/10.69760/gsrh.0260302009

Keywords:

pragmatic failure, conversational implicature, speech act theory, agentic AI, large language models, Gricean maxims, human-AI interaction, agent evaluation

Abstract

As autonomous AI agents are deployed across consequential domains — healthcare, legal assistance, financial planning, and software engineering — their communicative competence has emerged as a critical yet underexamined dimension of reliability. The present article investigates pragmatic failure in agentic large language model (LLM) systems: situations in which agents violate Gricean conversational maxims or misinterpret the illocutionary force of user utterances during multi-step task execution. Drawing on Grice’s (1975) Cooperative Principle and Searle’s (1969, 1975) Speech Act Theory, we develop a two-level annotation scheme for characterising pragmatic failure in agentic interaction logs and a typology of the failure modes that the scheme is designed to capture. We situate this framework within current debates on LLM pragmatic competence and demonstrate its application through illustrative analysis of interaction patterns documented in the existing literature. We further outline a planned empirical study applying the scheme to publicly available agentic benchmark data. The framework produces three practical outputs: a typology of pragmatic failure modes in agents, a reusable annotation scheme with operational coding criteria, and design recommendations for pragmatically-aware agent evaluation.

Author Biography

Zarifa Sadigzada, Nakhchivan State University

Zarifa Sadigzada

Department of English and Translation, Nakhchivan State University

zarifasadig@gmail.com

ORCID: https://orcid.org/0009-0007-1179-1214

References

Austin, J. L. (1962). How to do things with words. Harvard University Press.

Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press.

Fang, H., Zhu, X., & Gurevych, I. (2024). InferAct: Inferring safe actions for LLM-based agents through preemptive evaluation and human feedback. arXiv:2407.11843.

Flowerdew, J., & Costley, T. (Eds.). (2024). The rise of large language models: Challenges for critical discourse studies. Discourse, Context & Media, 62. https://doi.org/10.1080/17405904.2024.2373733

Frontiers in Education. (2025). How inclusive can large language models be? The curious case of pragmatics. Frontiers in Education, 10. https://doi.org/10.3389/feduc.2025.1619662

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 41–58). Academic Press.

Grice, H. P. (1989). Studies in the way of words. Harvard University Press.

Levy, N., et al. (2025). Gricean maxims in LLM development. NeurIPS 2025 Workshop: Evaluating the Evolving LLM Lifecycle, San Diego. https://neurips.cc/virtual/2025/loc/san-diego/133782

Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicature. MIT Press.

Li, X., et al. (2024). [Cited in: Mahowald, K., et al. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011]

Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Huang, M., Dong, Y., & Tang, J. (2023). AgentBench: Evaluating LLMs as agents. arXiv:2308.03688.

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011

Opitz, J., Wein, S., & Schneider, N. (2025). Natural language processing relies on linguistics. Computational Linguistics, 1–24.

Orsini, F., & Brunato, D. (2025). INDIR-IT: A benchmark for evaluating indirect speech acts in Italian LLMs. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025).

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Panfili, L., Duman, S., Nave, A., Ridgeway, K. P., Eversole, N., & Sarikaya, R. (2021). Human-AI interactions through a Gricean lens. arXiv:2106.09140.

Qi, Y., Zhang, R., Shi, Z., Zhu, Z., Xue, S., Zhang, X., Long, C., Yin, P., Dou, L., & Lin, Y. (2025). AgentIF: Benchmarking instruction following of large language models in agentic scenarios. arXiv:2505.16944.

Roig, J. V. (2025). How do LLMs fail in agentic scenarios? A qualitative analysis of success and failure scenarios of various LLMs in agentic simulations. arXiv:2512.07497.

Ruis, L., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2023). The Goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by LLMs. Advances in Neural Information Processing Systems, 36, 20827–20905.

Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.

Searle, J. R. (1975). Indirect speech acts. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 59–82). Academic Press.

Sieker, J., Shi, S., Koller, A., & Schlangen, D. (2023). Towards an analysis of discourse and interactional pragmatic reasoning capabilities of large language models. arXiv:2408.03074.

TELUS Digital. (2025, October 9). Agentic AI: Evolution & evaluation for real-world readiness. https://www.telusdigital.com/insights/data-and-ai/article/agentic-ai-evaluation

Wang, F., Li, X., Gur, I., Kil, T., Xu, L., Hejna, D., Zhu, H., Jain, D., Hu, T., Zheng, C., Bisk, Y., Xu, D., Shi, F., Yu, T., Chen, L., Xu, R., Wu, Z., & Ahmad, W. U. (2024). TheAgentCompany: Benchmarking LLM agents on consequential real-world tasks. arXiv:2412.14161.

Yu, D., Li, L., Su, H., & Fuoli, M. (2024). Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies. International Journal of Corpus Linguistics. https://doi.org/10.1075/ijcl.23087.yu

Zhang, Y., Shi, Y., Yu, X., Su, L., He, J., Zhang, Q., & Wen, J. R. (2025). Are your agents upward deceivers? arXiv:2512.04864.

Zheng, Y., Wang, J., Bao, J., Zhang, Y., & Wen, J. R. (2021). The GRICE dataset: Evaluating conversational implicature in language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL–IJCNLP 2021).

Zhou, S., Xu, F. F., Zhu, H., Zhou, X., Lo, R., Sridhar, P., Cheng, X., Bisk, Y., Fried, D., Alon, U., & Neubig, G. (2024). WebArena: A realistic web environment for building autonomous agents. Proceedings of ICLR 2024.

Pragmatic Failure in Agentic LLM Systems: An Analysis Through Gricean Maxims and Speech Act Theory

Authors

DOI:

Keywords:

Abstract

Author Biography

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Latest publications

Browse

Developed By

Make a Submission

Information

Language