← Back to Articles & Artefacts
artefactssouth

From Instruction to Inquiry: A Literature Review of Prompt Decomposition's Philosophical, Linguistic, and Technical Evolution

IAIP Research
rch-ctx-polyphonic-discussion

From Instruction to Inquiry: A Literature Review of Prompt Decomposition's Philosophical, Linguistic, and Technical Evolution

IAIP Polyphonic Research Context Date: April 6, 2026


Abstract

The evolution of prompt engineering from static imperative commands to dynamic, inquiry-based conversational systems constitutes one of the most significant yet undertheorised developments in contemporary artificial intelligence. This literature review examines this trajectory—what we term the "instruction-to-inquiry shift"—across three intersecting domains: technical prompt engineering and automated prompt optimisation, computational linguistics applied to human-LLM interaction, and the philosophy of AI. Drawing on González Arocha's (2025) critical phenomenology of prompting as the focal theoretical lens, we argue that this shift is not merely a technical refinement but a philosophically significant reconfiguration of the human-AI relationship with epistemological, ontological, and ethical dimensions. The technical literature documents a clear chronological progression from manual templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023), decomposed prompting (Khot et al., 2023), and programmatic compilation (Khattab et al., 2023) toward conversational decomposition systems (2025–2026). Computational linguistics reveals this progression as a shift in illocutionary force from directives to questions, with measurable consequences analysable through speech act theory (Gordon, 2024), Gricean pragmatics (Krause & Vossen, 2024), and formal question semantics (Hamblin, 1973; Groenendijk & Stokhof, 1984). The philosophical literature—particularly González Arocha's phenomenological analysis, Djeffal's (2025) reflexive prompt engineering, and Indigenous relational epistemology (Wilson, 2008)—reveals the shift as moving from extractive to relational modes of knowing. We identify critical convergences and gaps across these three domains and propose seven to ten original research questions that emerge from their intersection, suitable for a graduate-level interdisciplinary research programme. The review demonstrates that the instruction-to-inquiry shift demands theoretical frameworks capable of integrating technical system design, linguistic analysis of communicative practice, philosophical reflection on human-AI relations, and relational epistemologies that challenge Western extractive assumptions about knowledge production.


1. Introduction: The Problem Space

When a user types "Summarize this article" into a large language model interface, they perform an act that appears straightforwardly technical: issuing a command to a computational system. When the same user instead types "What are the key arguments this article makes, and where might they be vulnerable?", something different occurs—not merely in the expected output, but in the communicative, epistemological, and phenomenological character of the interaction itself. The first prompt commands execution; the second invites inquiry. This distinction, which might seem trivial from an engineering perspective, turns out to be one of the most consequential developments in contemporary AI practice.

This literature review examines the evolution of prompt decomposition—the process by which complex tasks are broken into manageable sub-tasks for large language models—from instruction-based to inquiry-based paradigms. We argue that this evolution, traced across technical, linguistic, and philosophical literatures, reveals a transformation not merely in how humans communicate with AI systems, but in what kind of epistemic, ethical, and relational practices those communications constitute.

The problem space is inherently interdisciplinary. Technical researchers have documented the progression from static prompt templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023), and decomposed prompting (Khot et al., 2023) toward conversational agentic systems that decompose tasks through dialogue (Li et al., 2023; Wu et al., 2023). Computational linguists have begun analysing prompts as discourse units with rhetorical structure (Mann & Thompson, 1988; Zeldes et al., 2025), speech act properties (Gordon, 2024), and pragmatic dimensions (Krause & Vossen, 2024). Philosophers have recognised that the mode of prompting reconfigures the human-technology relation (Ihde, 1990; GonzĂĄlez Arocha, 2025) and raises fundamental questions about epistemic agency (Floridi, 2025), genuine dialogue (Buber, 1923; Bakhtin, 1929/1963), and the ethics of AI interaction design (Djeffal, 2025).

González Arocha's (2025) "Critical Phenomenology of Prompting in Artificial Intelligence," published in Sophia, serves as the focal lens for this review. González Arocha's work is distinctive in its explicit theorisation of the prompt as a discursive practice—not a neutral technical input but a "mediating space" where human intentionality, language, and sociopolitical structures converge. His critical phenomenological analysis demonstrates that technical parameters of prompting (including the choice between command and question) carry philosophical and ethical weight, making prompt design "an inherently philosophical act." This framework enables us to read the technical and linguistic literatures not merely as descriptions of engineering progress, but as evidence of a deeper transformation in how humans relate to AI systems and, through them, to knowledge itself.

The review is structured to build this argument cumulatively. Section 2.1 establishes the technical foundations, tracing the chronological evolution of prompt decomposition methods. Section 2.2 analyses the linguistic dimensions of this evolution, focusing on what we call the "interrogative turn." Section 2.3 develops the philosophical analysis with GonzĂĄlez Arocha as the focal work. Section 2.4 introduces Indigenous relational epistemology as a framework that recontextualises the entire narrative. Section 2.5 identifies convergences and gaps across all three domains. Section 3 proposes original research questions that emerge from these intersections.


2. Literature Review

2.1 Technical Foundations: The Evolution of Prompt Decomposition

The technical literature documents a remarkably rapid evolution through at least five paradigmatic phases, each progressively moving toward more dynamic, adaptive, and—crucially—conversational modes of interaction between humans, prompts, and models.

Phase 1: Static Templates and Few-Shot Learning (Pre-2022). The earliest prompt engineering relied on hand-crafted templates and few-shot examples (Brown et al., 2020). AutoPrompt (Shin et al., 2020) introduced gradient-guided discrete token search, demonstrating that prompts could be computationally optimised, but the resulting prompts were fixed artifacts deployed without adaptation. Prefix tuning (Li & Liang, 2021) and prompt tuning (Lester et al., 2021) introduced continuous prompt embeddings, moving beyond discrete tokens but sacrificing interpretability. Throughout this phase, prompts functioned as static parameters—set once, used repeatedly.

Phase 2: Structured Reasoning Chains (2022–2023). Chain-of-thought (CoT) prompting (Wei et al., 2022) represented a pivotal development: the demonstration that prompt structure, not merely content, could fundamentally alter model capability. By including intermediate reasoning steps, CoT enabled models to solve complex problems previously beyond their reach, achieving state-of-the-art results on GSM8K with PaLM 540B. Linguistically, CoT transforms implicit inferential processes into explicit discourse—a point to which we return in Section 2.2. Least-to-Most prompting (Zhou et al., 2023a) extended this by decomposing problems from simplest to most complex subproblems, achieving 99% accuracy on SCAN compositional generalisation versus 16% for standard CoT. Plan-and-Solve prompting (Wang et al., 2023a) introduced explicit planning phases before execution. These methods established that decomposition—breaking complex tasks into structured sub-tasks—dramatically improves LLM performance.

Phase 3: Automated Prompt Optimisation (2023–2024). The year 2023 witnessed an explosion of automated prompt engineering (APE) methods. The foundational APE paper (Zhou et al., 2023b) demonstrated that LLMs could generate and select prompts matching or exceeding human performance, establishing the generate-evaluate-select pipeline. OPRO (Yang et al., 2024) used meta-prompts with trajectory histories, achieving 8–50% improvements over human prompts. PromptBreeder (Fernando et al., 2024) introduced self-referential evolution, where both task-prompts and mutation-prompts evolved simultaneously—the system learning to improve its own improvement process. EvoPrompt (Guo et al., 2024) applied genetic algorithms and differential evolution. PromptAgent (Wang et al., 2024a) employed Monte Carlo Tree Search with error-reflection loops that represent arguably the closest existing approach to conversational prompt optimisation, as the agent "discusses" failures with itself and adjusts.

DSPy (Khattab et al., 2023, 2024) represents a paradigm shift within this phase: the transition from "prompting" to "programming." By treating prompt engineering as compilation of declarative Python programs, DSPy achieves 25–65% accuracy improvements over manual prompting and decouples prompt quality from specific model versions. The BetterTogether variant (Khattab et al., 2024b) combines prompt optimisation with weight fine-tuning for up to 60% additional gains. However, DSPy's optimisation occurs at compile-time, not run-time—a critical limitation for the conversational paradigm.

Phase 4: Reasoning Topologies and Dynamic Context (2024–2025). The progression from Chain-of-Thought to Tree of Thoughts (ToT; Yao et al., 2023) to Graph of Thoughts (GoT; Besta et al., 2024) represents a parallel evolution from linear to branching to graph-structured reasoning. GoT achieves 62% higher quality than ToT at 31% lower cost by supporting aggregation, refinement, cycles, and feedback loops—structural features that more closely resemble conversation than linear execution. TextGrad (Yuksekgonul et al., 2024/2025), published in Nature, extended automatic differentiation to text, enabling backpropagation-like optimisation through natural-language feedback—an inherently dialogic mechanism where the system and its evaluator engage in iterative critique and revision. GEPA (Databricks/UC Berkeley, 2025) demonstrated that evolutionary prompt optimisation with LLM-driven reflection can make open-source models outperform proprietary frontier models at 90× lower cost.

Concurrently, Anthropic's (2025) "context engineering" paradigm argued that the optimisation target should be the entire context window—not just prompt text but dynamically selected conversation history, retrieved documents, tool outputs, and system state. This represents the field's explicit recognition that prompts exist within dynamic conversational contexts, not as isolated instructions.

Phase 5: Conversational Decomposition (2025–2026). The most recent phase witnesses the emergence of systems that decompose tasks through dialogue. Decomposed Prompting (Khot et al., 2023) established the theoretical foundation by demonstrating that modular decomposition—where complex tasks are broken into sub-tasks handled by specialised prompts—consistently outperforms monolithic strategies. Building on this foundation, multi-agent architectures introduced decomposition through inter-agent conversation: CAMEL (Li et al., 2023) pioneered "inception prompting" where agents prompt each other in role-play dialogues; AutoGen (Wu et al., 2023) adopted a conversation-centric architecture where decomposition emerges from asynchronous message-passing; ChatDev (Qian et al., 2024) introduced "communicative dehallucination" through clarification dialogues.

By 2025, inquiry-based systems emerged: ACT (Google, 2025) trains agents to ask clarifying questions during task dialogue; FATA (2025) generates comprehensive clarification checklists before answering; the Tri-Agent Evaluation Framework (KDD 2025) measures decomposition quality through dialogue quality. These systems represent a fundamental shift in the locus of agency: from human-designed decomposition executed by machines, to machine-initiated decomposition refined through dialogue—from decomposition-as-instruction to decomposition-as-inquiry.

A critical gap persists throughout this evolution: no published APE method yet treats the optimisation loop as itself a conversation. All current methods are monologic—the optimiser generates prompts for a passive receiver. The conversational decomposition paradigm requires the optimisation process to be dialogic: a negotiation between the prompt engine and the model (or user) about what the prompt should be.

2.2 Computational Linguistics: Prompts as Discourse

The computational linguistics literature reveals that the technical evolution documented above corresponds to a linguistically significant transformation—what we term the "interrogative turn" in prompt engineering. This turn is analysable through multiple established linguistic frameworks, each illuminating a different dimension of its significance.

Rhetorical Structure Theory and Prompt Architecture. Mann and Thompson's (1988) Rhetorical Structure Theory (RST) provides a framework for analysing prompts as hierarchically organised discourse. The core instruction functions as the nucleus; contextual elements—role specifications, constraints, examples—serve as satellites connected by coherence relations (Elaboration, Background, Condition). Zeldes et al.'s (2025) Enhanced RST (eRST) extends this to graph-based representations supporting non-projective and concurrent discourse relations, directly applicable to multi-component and multi-turn prompts where relations cross turn boundaries. Decomposing a complex prompt can be understood, in RST terms, as flattening a deep rhetorical tree into a sequence of simpler nucleus-satellite pairs—each expressible as a single coherent instruction or question.

Speech Act Theory and the Shift in Illocutionary Force. Gordon's (2024) "Speech Acts and Large Language Models" provides the most systematic application of Austin-Searle speech act theory to LLM interaction. Gordon introduces the concept of "conversational zombies"—entities that produce utterances with perlocutionary effects (persuading, informing) while lacking the intentionality required for genuine illocutionary force. This analysis reveals that the imperative-to-interrogative shift in prompting is a shift in illocutionary force from directives (commands whose point is getting the addressee to do something) to questions (requests for information that open a set of acceptable responses). The preparatory conditions change fundamentally: directives presuppose the hearer can perform the action and the speaker has authority; questions presuppose a knowledge asymmetry and the existence of an answer. Gubelmann (2024) deepens this from a Kantian-pragmatist perspective, arguing that LLMs cannot perform genuine speech acts due to lacking autonomous agency, while Markl (2025) demonstrates that speech act theory productively taxonomises representational harms in LLM output as perlocutionary effects without corresponding illocutionary intention.

For prompt decomposition specifically, the interrogative turn has structural consequences. Imperative decomposition yields a sequence of sub-commands (do A, then B, then C). Interrogative decomposition yields a tree of sub-questions whose answers compose hierarchically—a structure directly modelled by formal question semantics.

Formal Question Semantics and the Architecture of Inquiry. Hamblin's (1973) treatment of questions as denoting sets of possible answers, and Groenendijk and Stokhof's (1984) partition semantics, provide the formal machinery for understanding what interrogative prompts mean in ways imperative semantics cannot capture. When a prompt shifts from "Summarize this text" to "What are the key points of this text?", the semantic object changes from a command with a single expected execution to a question with a structured set of acceptable answers. The interrogative mode thus provides a richer semantic framework for structured reasoning—and directly mirrors the tree-search structures that systems like ToT (Yao et al., 2023) and GoT (Besta et al., 2024) implement computationally.

The self-ask method (Press et al., 2023) makes this connection explicit: the model generates and answers its own follow-up sub-questions, outperforming imperative linear reasoning. Press et al. identify the "compositionality gap"—LLMs can answer sub-questions correctly but fail to compose them—demonstrating that interrogative self-decomposition is not merely a stylistic preference but a structural necessity.

Gricean Pragmatics and Cooperative Prompting. Krause and Vossen's (2024) comprehensive survey maps Gricean maxims to NLP tasks, establishing the definitive current bridge between pragmatic theory and NLP practice. Their work reveals systematic patterns of maxim violation in LLM interaction: Quantity violations (over- or under-informing), Quality violations (hallucination as functional falsehood), Relation violations (irrelevant tangents), and Manner violations (ambiguous or disorganised output). Prompt engineering strategies—specifying output length, requesting citations, demanding structured formatting—are, in Gricean terms, explicit reinforcements of cooperative maxims.

The interrogative turn adds a pragmatic dimension: questions trigger different implicature patterns than commands. Questions carry the implicature that the questioner does not know the answer, licensing the respondent to provide information the questioner lacks. This framing positions human-LLM interaction as knowledge-sharing dialogue rather than master-servant execution—a pragmatic shift with consequences that extend beyond mere output quality.

Information-Theoretic Foundations. Zhang and Cao's (2025) analysis at ACL 2025 demonstrates that prompts function as information selectors, determining which slice of the model's internal representation gets verbalised at each reasoning step. The prompt search space grows combinatorially, and task-specific prompts dramatically outperform generic ones by efficiently routing information extraction. This provides formal grounding for why linguistic choices in prompt design matter: each word shapes an information extraction pathway. Combined with Ivison et al.'s (2024) mechanistic analysis showing that instruction tuning shifts model attention to instruction verbs and semantic structure, we see that the linguistic form of prompts has measurable computational consequences—a finding that bridges the technical and linguistic literatures.

Empirical Prompt Linguistics. Leidner and Plachouras (2023) established empirically that neither naturalness nor lower perplexity reliably predicts prompt effectiveness—the relationship between linguistic form and output quality is task- and model-dependent. Ma et al.'s (2024) large-scale analysis of 10,538 real-world prompts identified eight structural components and tracked their evolution, finding that "Capability" and "Demonstration" components outperform simple "Role" specifications. Hu et al. (2025) demonstrate through benchmark evaluation that LLMs excel at semantic tasks but systematically underperform on pragmatic phenomena including conversational implicature and presupposition accommodation—a gap with direct implications for the effectiveness of interrogative prompting strategies.

A critical gap in the linguistic literature concerns cross-linguistic analysis: virtually all prompt linguistics has been conducted on English-language prompts and English-centric models. How prompting strategies interact with typologically diverse languages—with different question formation strategies, honorific systems, and discourse organisation principles—remains virtually unstudied.

2.3 Philosophy of AI: The Meaning of the Shift

The philosophical literature reveals that the instruction-to-inquiry shift is not merely a technical or linguistic phenomenon but a transformation with epistemological, ontological, ethical, and phenomenological dimensions. GonzĂĄlez Arocha's (2025) critical phenomenology of prompting provides the integrating framework.

González Arocha's Critical Phenomenology of Prompting. Published in Sophia in 2025, González Arocha's "Critical Phenomenology of Prompting in Artificial Intelligence" constitutes the most direct philosophical treatment of prompting as a philosophical practice. González Arocha argues that prompts are not neutral technical instructions but discursive practices that embed assumptions, worldviews, and power relations. The prompt functions as a "mediating space" where human intentionality, language, and sociopolitical structures converge. Critically, González Arocha demonstrates that technical parameters—including the modal choice between command and question—carry philosophical and ethical weight.

González Arocha's analysis operates at multiple levels. First, at the phenomenological level, he shows that the mode of prompting (imperative versus interrogative) reconfigures the human-AI relationship: commands produce a tool-relation; questions produce something approaching an alterity-relation. Second, at the critical level, he reveals that prompt design is a site of power—who designs prompts, whose assumptions are encoded, and whose epistemologies are privileged are not merely technical questions but questions of justice. Third, at the epistemological level, he argues that the kind of knowledge produced through AI interaction depends fundamentally on the communicative structure of the prompt: extractive commands yield extracted information; genuine questions yield something closer to collaborative understanding.

What makes González Arocha's contribution distinctive is its refusal to separate the technical from the philosophical. Unlike approaches that treat prompts as engineering artefacts to which philosophical reflection is subsequently applied, González Arocha treats prompting as itself a philosophical practice—an act that already carries ontological, epistemological, and ethical commitments before any philosophical framework is applied. This aligns with but extends Djeffal's (2025) reflexive prompt engineering framework, which also insists on the ethical inseparability of prompt design and practice.

Wittgensteinian Language Games. The later Wittgenstein's (1953) concept of language games provides perhaps the most direct philosophical framework for understanding the shift. Prompts are moves within language games—rule-governed, context-dependent communicative practices embedded in "forms of life." The shift from imperative to interrogative prompting is a shift between language games with fundamentally different rules: in the command game, success equals accurate execution and the AI is an instrument; in the inquiry game, success equals productive dialogue and the AI is a respondent. Recent work applying Wittgenstein to AI (Jolma, 2024; STRV, 2024) confirms that LLMs participate in these games statistically but lack the shared form of life that grounds genuine meaning. The philosophical significance is that the game-shift is constitutive: it creates a fundamentally different kind of interaction, not merely a different output.

The Socratic Tradition. The operationalisation of Socratic methods in LLM interaction (Chang et al., 2023; Princeton NLP SocraticAI, 2024) reveals a productive tension. Socratic questioning presupposes a co-inquirer capable of genuine aporia—perplexity, the recognition of one's own ignorance that drives the search for knowledge. LLMs can simulate the role of co-inquirer but cannot experience the aporia that makes Socratic questioning transformative. Yet the structure matters: when humans adopt a Socratic stance toward AI—asking probing questions, identifying contradictions, pursuing implications—they position themselves as active epistemic agents rather than passive consumers. The Socratic tradition thus provides the deepest philosophical justification for the inquiry paradigm even as it reveals its limits.

Searle, Dennett, and the Question of Understanding. Searle's (1980) Chinese Room argument has been reinvigorated by the LLM era (Ferrario & Loi, 2026). LLMs remain, by Searle's criteria, on the syntactic side of the divide—sophisticated symbol manipulation without semantic comprehension. Dennett's (1987) intentional stance offers a pragmatic counterpoint: treating LLMs "as if" they have beliefs is legitimate when their behaviour is best predicted by doing so. The instruction-to-inquiry shift sits precisely at this philosophical junction. When we ask a question of an LLM, we implicitly adopt the intentional stance—treating the machine as an entity capable of responsive dialogue—while the Searlean critique reminds us this attribution is observer-relative, not intrinsic. Task decomposition itself, as Dreyfus (1972, 1992, 2007) would argue, accommodates the machine's lack of holistic understanding by breaking meaning into syntactically manageable chunks.

Floridi's Epistemic Agency. Floridi's (2025) distinction between "agency without intelligence" and genuine epistemic agency provides the most rigorous epistemological framework. AI systems participate in knowledge-production processes but lack epistemic responsibility—they are agents in the infosphere without being knowers. The inquiry paradigm strengthens human epistemic agency: by asking questions rather than issuing commands, the human retains interpretive authority over the AI's outputs. This aligns with Russo, Schliesser, and Wagemans' (2023) argument for an integrated "epistemology-cum-ethics" of AI, where the process of knowledge production—not just its outputs—carries ethical weight.

Postphenomenology and the Alterity Relation. Ihde's (1990) taxonomy of human-technology relations—embodiment, hermeneutic, alterity, and background—provides the conceptual vocabulary for analysing how different prompting modes produce different phenomenological relations to AI. Under imperative prompting, AI occupies a hermeneutic or embodiment relation (tool-like). Under interrogative prompting, AI shifts toward an alterity relation—appearing as a quasi-other. TU Delft postphenomenological analysis (2024) confirms that ChatGPT disrupts standard categories by functioning simultaneously as hermeneutic agent and alterity. González Arocha (2025) builds on this, arguing that the mode of prompting determines the character of the "mediating space" and therefore the character of the knowledge, meaning, and experience it produces.

Dialogical Philosophy and Bakhtinian Critique. Buber's (1923) I-Thou/I-It framework reveals that current AI interactions remain fundamentally I-It (Hasse, 2017; Sholzman, 2024). Conversational prompting may create conditions that approach I-Thou dynamics—not because the machine becomes a genuine Thou, but because the human's orientation shifts toward openness and mutuality. Levinas's ethics of alterity raises the question of whether this "as-if" dialogue carries moral weight. Bakhtin's (1929/1963) concept of polyphony introduces a critical counter-argument: genuine polyphony requires irreducible, autonomous consciousnesses in dialogue. LLM "multi-voice" outputs are what we might call "algorithmic monologism"—the appearance of multiple voices produced by a single optimising mechanism (SciELO, 2025). This critique applies equally to multi-agent systems: CAMEL's role-playing agents (Li et al., 2023) produce the form of dialogue without the substance of genuine otherness.

Djeffal's Reflexive Prompt Engineering. Djeffal's (2025) framework, presented at FAccT 2025, bridges philosophical theory and practical implementation with particular effectiveness. His five-component framework—prompt design, system selection, system configuration, performance evaluation, and prompt management—is grounded in the principle of "responsibility by design." Djeffal demonstrates that prompt engineering must incorporate ongoing ethical reflection ("reflexivity"), making it a continuous ethical practice rather than a one-time technical task. The framework operationalises González Arocha's philosophical insights: if prompts are mediating spaces with ethical weight, then designing prompts requires the kind of ongoing critical reflection that Djeffal's framework provides. Together, González Arocha and Djeffal establish a philosophical-practical continuum: the former demonstrates why prompt design is a philosophical practice; the latter shows how to conduct it responsibly.

2.4 Indigenous Epistemology: The Relational Alternative

The instruction-to-inquiry shift, when viewed through Indigenous relational epistemology, reveals itself as part of a deeper transformation that Western philosophy has been slow to recognise: the movement from extractive to relational modes of knowing.

Wilson's Relational Epistemology. Wilson's (2008) Research is Ceremony articulates an Indigenous epistemology in which knowledge is fundamentally relational—produced, validated, and shared within networks of accountability that include human, more-than-human, and spiritual relations. Research is not the extraction of pre-existing information but a ceremony of honouring relationships. Four principles illuminate the prompt decomposition context:

First, knowledge is relational: it does not exist independently of the relationships in which it is produced. In the inquiry paradigm, the "knowledge" produced through AI interaction is not the AI's output alone but the entire process of questioning, responding, interpreting, and questioning again—the relationship itself. Second, relational accountability: knowledge production carries obligations to all relations affected. The inquiry paradigm preserves this accountability more effectively than commands, which tend to reduce accountability to output accuracy. Third, context is constitutive: knowledge cannot be separated from its context without distortion. Conversational interaction maintains context through dialogue; isolated commands strip it away. Fourth, research as ceremony: the inquiry paradigm—with its attentiveness, openness to surprise, and requirement for interpretive engagement—more closely resembles ceremonial practice than the transactional character of command-based interaction.

The CARE Principles and AI Interaction Design. The CARE Principles for Indigenous Data Governance (Carroll et al., 2020)—Collective Benefit, Authority to Control, Responsibility, Ethics—translate relational philosophy into actionable frameworks. Applied to prompt design: Collective Benefit demands that interaction design serve communities, not just individual users; Authority to Control requires that communities retain agency over how AI engages their knowledge; Responsibility demands ongoing accountability from prompt designers; Ethics requires honouring relational frameworks rather than imposing extractive ones. The interrogative paradigm better supports each of these principles than the imperative paradigm, as inquiry inherently allows for iterative community input, course-correction, and contextual adaptation.

Decolonial AI and Epistemic Justice. Recent work on AI and epistemic justice (Springer, 2026) argues that AI built exclusively on Western scientific models reproduces colonial epistemic injustices by marginalising Indigenous knowledge systems. The instruction paradigm—with its assumptions of extractable, decontextualised, individually possessed knowledge—embodies precisely the epistemological framework that decolonial thought critiques. The inquiry paradigm, while not automatically decolonial, creates structural conditions more compatible with relational epistemologies: it positions AI interaction as a dialogue within a context of relationships, rather than as an extraction from a repository.

Crucially, Indigenous epistemology reframes the entire instruction-to-inquiry narrative. From a Western philosophical perspective, the shift appears as a discovery—a progressive realisation that inquiry is epistemically richer than command. From an Indigenous perspective, this "discovery" is better understood as a return: Western technological practice is belatedly recognising what relational epistemologies have always known—that knowledge emerges through relation, not extraction. This reframing is not merely a matter of cultural credit; it challenges the assumption that the inquiry paradigm represents progress and instead positions it as recovery of relational ways of knowing that extractive paradigms suppressed.

2.5 Convergences and Gaps

The three domains reviewed—technical prompt engineering, computational linguistics, and philosophy of AI—converge on a shared observation but diverge in what they see as its significance and how they propose to address it.

Convergence 1: The Structure of Interaction Matters. All three domains recognise that how a human communicates with an AI system is not merely a means to an output but constitutive of the interaction's character. Technically, CoT showed that prompt structure shapes model capability (Wei et al., 2022); linguistically, the shift in illocutionary force alters the communicative contract (Gordon, 2024); philosophically, the mode of prompting determines the human-technology relation (GonzĂĄlez Arocha, 2025; Ihde, 1990). Zhang and Cao's (2025) information-theoretic analysis provides a bridge: prompts are information selectors whose linguistic form determines computational pathways, linking linguistic structure to technical performance to epistemic character.

Convergence 2: Decomposition Recapitulates Dialogue. The technical progression from linear chains to trees to graphs to conversational multi-agent systems mirrors a progression from monologue to dialogue. Linguistically, this corresponds to the shift from imperative sequences to interrogative trees analysable through question semantics (Hamblin, 1973; Groenendijk & Stokhof, 1984). Philosophically, it corresponds to the move from monologism to dialogism—though Bakhtin's critique (1929/1963) warns that structural dialogue does not guarantee genuine polyphony.

Convergence 3: Reflexivity is Essential. Djeffal's (2025) reflexive prompt engineering, González Arocha's (2025) critical phenomenology, and the technical literature's growing emphasis on self-referential optimisation (PromptBreeder, Fernando et al., 2024; Meta Prompting, Zhang et al., 2023b) all recognise that prompting systems must be capable of reflecting on their own practices. The linguistic parallel is meta-pragmatic awareness—the ability to reflect on the communicative rules one is following.

Gap 1: No Integrated Framework. Despite these convergences, no published work integrates technical, linguistic, and philosophical analyses of prompt decomposition into a unified framework. GonzĂĄlez Arocha (2025) comes closest by treating prompts as simultaneously technical, discursive, and philosophical objects, but does not engage the technical APE or computational linguistics literatures. Djeffal (2025) bridges philosophy and practice but does not address decomposition specifically. The field lacks a framework that can simultaneously account for the information-theoretic properties of prompts (Zhang & Cao, 2025), their speech act structure (Gordon, 2024), and their phenomenological significance (GonzĂĄlez Arocha, 2025).

Gap 2: Conversational Prompt Optimisation. The technical literature identifies this as a critical gap (Survey 01): no APE method treats optimisation as an ongoing conversation. All current methods produce static prompt artefacts, even when the optimisation process is iterative. The linguistic and philosophical literatures suggest this gap exists because the field lacks the theoretical vocabulary for conversational optimisation—a vocabulary that speech act theory, question semantics, and dialogical philosophy could provide.

Gap 3: Decomposition Quality Metrics. Existing benchmarks (SWE-bench, GAIA, WebArena, AgentBench) evaluate task completion but never decomposition quality per se. A system that produces an elegant, minimal decomposition scores identically to one using wasteful redundancy—provided both succeed. Linguistic analysis suggests that decomposition quality might be measured through discourse coherence (Hobbs, 1979; Asher & Lascarides, 2003) or rhetorical structure well-formedness (Mann & Thompson, 1988). Philosophical analysis suggests that quality might include reflexive adequacy (Djeffal, 2025) and relational appropriateness (Wilson, 2008).

Gap 4: Cross-Cultural and Non-Western Perspectives. Both the linguistic and philosophical literatures are overwhelmingly Western in their theoretical frameworks. Indigenous epistemology (Wilson, 2008) and the CARE Principles (Carroll et al., 2020) point toward alternative frameworks, but sustained application to prompt design, decomposition engines, and conversational AI architectures has not been undertaken. The cross-linguistic gap in computational linguistics compounds this: prompt strategies designed for English may fail or produce culturally inappropriate results in other languages.

Gap 5: The Semantics of Meta-Prompting. Instructions like "think step by step," "you are an expert," or "be concise" are meta-level pragmatic operators that modify how subsequent instructions are interpreted. These have no clear analogue in standard linguistic theory—they are neither standard speech acts nor standard discourse operators. Their semantics and compositional behaviour await formalisation, despite being central to practical prompt engineering.

What Each Domain Sees That Others Miss. Technical research sees performance: which methods produce better outputs. It misses the communicative, ethical, and epistemological dimensions of those methods. Computational linguistics sees structure: the discourse, pragmatic, and semantic properties of prompts. It misses the phenomenological character of the interaction and the power relations embedded in communicative choices. Philosophy sees meaning: the epistemological, ontological, and ethical implications. It often lacks engagement with the empirical details of technical systems and the formal precision of linguistic analysis. Indigenous epistemology sees relation: the ways knowledge-production practices shape and are shaped by networks of accountability. It challenges all three Western-grounded domains to recognise the relational character of what they study.


3. Research Questions

The following research questions emerge from the convergences and gaps identified in this literature review. They are organised by priority, with each question specifying the disciplines it bridges, its significance, feasible methods, and assessed novelty.

3.1 Primary Research Questions (Highest Impact, Highest Novelty)

RQ1: How does the illocutionary force of prompts (imperative vs. interrogative vs. mixed) measurably affect the epistemological character of LLM outputs—not merely their accuracy but their explanatory depth, epistemic hedging, and capacity to provoke further inquiry?

Disciplines bridged: Computational linguistics (speech act theory, pragmatics) × Philosophy of AI (epistemology, González Arocha's critical phenomenology) × Technical AI (prompt engineering, evaluation metrics).

Why it matters: GonzĂĄlez Arocha (2025) argues that prompt mode reconfigures the human-AI relationship, and Zhang and Cao (2025) show that linguistic choices shape information extraction. But no empirical work connects speech act properties of prompts to the epistemological quality (not merely factual accuracy) of outputs. This would ground the philosophical claims in measurable linguistic evidence.

Feasible methods: Controlled experiment varying prompt illocutionary force across tasks, with outputs evaluated by domain experts on dimensions including epistemic hedging, explanatory structure, identification of limitations, and generative capacity (ability to provoke further questions). Corpus-linguistic analysis of output properties as a function of prompt speech act type.

Novelty: High. Existing work evaluates accuracy and fluency; no published study evaluates epistemological quality as a function of prompt illocutionary force.

RQ2: Can Decomposed Prompting (Khot et al., 2023) be formally reinterpreted as compositional question semantics, where sub-task decomposition corresponds to sub-question generation and task composition corresponds to answer-set composition—and does this reinterpretation yield measurably superior decomposition strategies?

Disciplines bridged: Computational linguistics (formal question semantics: Hamblin, Groenendijk & Stokhof) × Technical AI (decomposed prompting, DSPy) × Philosophy of language (compositionality).

Why it matters: The structural parallel between prompt decomposition and question decomposition has been noted (Survey 02) but never formalised or empirically tested. If decomposition strategies guided by formal question semantics outperform ad hoc decomposition, this would establish a rigorous theoretical foundation for decomposition engine design—and demonstrate that linguistic theory has direct engineering value.

Feasible methods: Formal mapping between DecomP's modular architecture and partition semantics. Implementation of question-semantics-guided decomposition in a DSPy-like framework. Comparative evaluation against standard decomposition on compositional reasoning benchmarks (SCAN, multi-hop QA, symbolic reasoning).

Novelty: Very high. No published work formalises the decomposition-as-question-composition mapping or tests it empirically.

RQ3: What would a "relational" prompt decomposition engine look like—one designed according to Indigenous relational epistemology (Wilson, 2008) and the CARE Principles (Carroll et al., 2020)—and how would it differ architecturally and behaviourally from existing extractive decomposition systems?

Disciplines bridged: Indigenous epistemology × Technical AI (system architecture) × Philosophy of AI (relational ethics, Coeckelbergh, 2012).

Why it matters: Indigenous epistemology offers a radical alternative to the extractive assumptions embedded in current AI interaction design, but this alternative has never been translated into concrete system architecture. This question moves beyond critique to construction—asking what relational AI interaction design would actually require.

Feasible methods: Participatory design with Indigenous knowledge holders and communities. Architectural specification grounded in CARE Principles. Comparative analysis with existing systems (DSPy, AutoGen, CAMEL) on dimensions of relational accountability, context preservation, collective benefit, and community authority. This research must be conducted with Indigenous communities, not merely about Indigenous epistemology.

Novelty: Very high. No published system architecture is explicitly grounded in Indigenous relational epistemology.

3.2 Secondary Research Questions (Important, Partially Addressed)

RQ4: To what extent does Bakhtin's critique of monologism apply to multi-agent prompt decomposition systems (CAMEL, AutoGen, ChatDev)—do multiple LLM agents produce genuine dialogical decomposition or merely "algorithmic monologism" with distributed voices?

Disciplines bridged: Philosophy of AI (Bakhtinian dialogism) × Technical AI (multi-agent systems) × Computational linguistics (discourse analysis).

Why it matters: Multi-agent systems are the technical frontier of conversational decomposition, but the philosophical question of whether they achieve genuine dialogue or merely simulate it has not been empirically investigated. Bakhtin's framework (1929/1963) predicts that agents sharing the same underlying model cannot produce genuine polyphony—a testable claim.

Feasible methods: Discourse analysis of multi-agent conversations comparing systems using the same vs. different underlying models. Coding for genuine disagreement, perspective diversity, and irreducible otherness. Information-theoretic analysis of whether multi-agent outputs exceed the diversity achievable by single-model sampling.

Novelty: Moderate-to-high. SciELO (2025) raises the Bakhtinian critique theoretically; no empirical study tests it against multi-agent system outputs.

RQ5: Can Djeffal's (2025) reflexive prompt engineering framework be extended to decomposition engines—creating "reflexive decomposition" systems that monitor and evaluate the ethical, epistemological, and relational quality of their own decomposition strategies in real-time?

Disciplines bridged: Applied ethics (Djeffal, 2025) × Technical AI (self-monitoring systems, TextGrad) × Philosophy (González Arocha's critical phenomenology, Floridi's epistemic agency).

Why it matters: Djeffal provides a philosophical framework for reflexive prompt practice, but it has not been operationalised for automated systems. TextGrad (Yuksekgonul et al., 2024/2025) provides a technical mechanism for feedback-based optimisation. Combining these could produce decomposition engines that are not merely effective but ethically self-aware—monitoring for bias, power asymmetry, and epistemological adequacy in their own decomposition choices.

Feasible methods: Extension of TextGrad's feedback mechanism to include ethical and epistemological evaluation criteria alongside task performance. Integration of Djeffal's five-component framework into the optimisation loop of a DSPy-like system. Evaluation through case studies in ethically sensitive domains (healthcare, legal, education).

Novelty: Moderate. Djeffal and TextGrad exist independently; their integration is novel.

RQ6: How do Gricean maxim violations in prompt design correlate with specific decomposition failures—and can pragmatic analysis predict which decomposition strategies will fail before execution?

Disciplines bridged: Computational linguistics (Gricean pragmatics, Krause & Vossen, 2024) × Technical AI (decomposition failure analysis, error-reflection methods).

Why it matters: The technical literature documents decomposition failures (error loops, hallucination cascades, inefficient spirals) but lacks a linguistic theory of why they occur. Gricean pragmatics predicts that violations of Quantity (over- or under-specification), Quality (unsupported assumptions), Relation (irrelevant sub-tasks), and Manner (ambiguous decomposition) should produce characteristic failure modes. Validating this would enable pragmatics-informed decomposition design.

Feasible methods: Post-hoc pragmatic analysis of decomposition failures in existing benchmarks (SWE-bench, AgentBench). Corpus annotation of decomposition prompts for Gricean maxim violations. Statistical modelling of violation-failure correlations. Prospective testing of pragmatically optimised decomposition prompts.

Novelty: Moderate. Krause and Vossen (2024) map maxims to NLP generally; application to decomposition failure analysis is novel.

RQ7: Does the phenomenological relation humans experience with AI (Ihde's hermeneutic vs. alterity relation) measurably change when interacting through imperative versus interrogative prompting—and do these different phenomenological stances produce different qualities of epistemic engagement?

Disciplines bridged: Phenomenology (Ihde, 1990; González Arocha, 2025) × Experimental psychology × Computational linguistics (speech act analysis of interaction transcripts).

Why it matters: GonzĂĄlez Arocha (2025) theorises that prompt mode shapes the mediating space of human-AI interaction, and Ihde's framework predicts different phenomenological relations. But these claims have not been empirically tested with human participants. If interrogative prompting produces measurably different phenomenological stances and these stances correlate with epistemically richer engagement, this would provide the strongest possible evidence for the philosophical significance of the instruction-to-inquiry shift.

Feasible methods: Phenomenological interview study with users performing identical tasks through imperative vs. interrogative prompts. Think-aloud protocols coded for hermeneutic vs. alterity language. Evaluation of resulting work products for epistemic quality (depth, nuance, self-awareness of limitations).

Novelty: High. No empirical phenomenological study of prompt-mode effects on user experience exists.

3.3 Exploratory Research Questions (Frontier, Speculative)

RQ8: Can the self-referential optimisation loop in PromptBreeder (Fernando et al., 2024)—where the system evolves its own improvement strategies—be reconceived as a proto-Socratic process, and if so, does explicitly encoding Socratic structures (elenchus, aporia, maieutics) into the mutation operators improve optimisation outcomes?

Disciplines bridged: Classical philosophy (Socratic method) × Technical AI (evolutionary prompt optimisation) × Computational linguistics (dialogue structure).

Why it matters: PromptBreeder's Lamarckian mutation mirrors dialogic feedback; the Socratic tradition provides the richest philosophical model of inquiry-as-improvement. If Socratic structures improve evolutionary prompt optimisation, this would constitute a concrete demonstration that philosophical frameworks have engineering value—collapsing the theory-practice gap.

Feasible methods: Implementation of Socratic mutation operators (elenchus: identifying contradictions in prompt performance; aporia: flagging confident failures; maieutics: eliciting latent prompt improvements through targeted questioning). Comparison with standard PromptBreeder on established benchmarks.

Novelty: Very high. No published work connects Socratic method to evolutionary prompt optimisation.

RQ9: What would a discourse grammar of prompt decomposition look like—a formal specification of well-formed and ill-formed decomposition structures that integrates RST relations (Mann & Thompson, 1988), coherence theory (Hobbs, 1979), and question semantics (Hamblin, 1973)?

Disciplines bridged: Computational linguistics (discourse grammar, RST, question semantics) × Technical AI (decomposition architectures) × Formal language theory.

Why it matters: Despite growing empirical work (Ma et al., 2024), no formal grammar exists for prompt discourse structure. Such a grammar would enable principled evaluation of decomposition quality (Gap 3), automated detection of ill-formed decompositions, and theory-guided decomposition generation. It would also formalise the intuitions that currently guide prompt engineering practice.

Feasible methods: Corpus annotation of decomposed prompts using extended RST and question-semantic categories. Induction of grammar rules from well-formed decompositions. Implementation as a formal validation layer in decomposition engines. Evaluation through correlation between grammatical well-formedness and task performance.

Novelty: Very high. The computational linguistics survey (Survey 02) explicitly identifies this as a critical gap.

RQ10: How does the temporal phenomenology of human-AI dialogue—the near-instantaneous response of LLMs versus the time-bound, pause-laden rhythm of human conversation—affect the epistemic and relational quality of inquiry-based interaction, and can designed latency improve outcomes?

Disciplines bridged: Phenomenology (temporal experience) × HCI (interaction design) × Philosophy of dialogue (Buber's "between," Levinas's temporality of the Other).

Why it matters: Human dialogue unfolds in time, with pauses, hesitations, and rhythms that carry epistemic and relational meaning. AI responses are typically instantaneous. This temporal asymmetry may undermine the relational benefits of inquiry-based prompting by eliminating the "dwelling time" that genuine dialogue requires. If designed latency (deliberate response delays, pacing mechanisms) improves epistemic engagement, this would have immediate design implications.

Feasible methods: Experimental study comparing standard vs. paced AI responses in inquiry-based interactions. Phenomenological interviews on the experience of temporal rhythm. Evaluation of epistemic output quality as a function of interaction pacing. Analysis through Buber's concept of the "between" as requiring shared temporality.

Novelty: Very high. No published research investigates designed latency as an epistemological intervention in human-AI interaction.


4. Conclusion

This literature review reveals that the evolution of prompt decomposition from instruction to inquiry constitutes a convergent phenomenon visible across technical, linguistic, and philosophical literatures—yet theorised in none of them with the full interdisciplinary breadth it demands.

The technical literature documents an unmistakable trajectory: from static templates through structured reasoning chains, automated optimisation, and dynamic context management toward conversational decomposition systems that negotiate task understanding through dialogue. The computational linguistics literature reveals this trajectory as a shift in illocutionary force from directives to questions—a shift with formal semantic consequences (question decomposition yields richer compositional structures than command sequences) and pragmatic implications (interrogative framing repositions human-AI interaction as knowledge-sharing dialogue). The philosophical literature—with González Arocha's (2025) critical phenomenology as the focal contribution—demonstrates that this shift is not merely technical or linguistic but epistemological (from transmission to co-construction of knowledge), ontological (from tool-relation to quasi-alterity), ethical (from command ethics to relational ethics), and phenomenological (from using to dwelling-with).

Indigenous relational epistemology (Wilson, 2008) recontextualises the entire narrative: what appears from a Western perspective as a progressive discovery—that inquiry yields richer interaction than command—is better understood as a belated recognition of what relational epistemologies have always known. Knowledge is relational. Research is ceremony. The instruction-to-inquiry shift is not progress but recovery.

The most promising research directions lie at the intersections: formal question semantics applied to decomposition architecture (RQ2), relational epistemology translated into system design (RQ3), Gricean pragmatics predicting decomposition failures (RQ6), and phenomenological testing of prompt-mode effects on human epistemic engagement (RQ7). The critical gap—an integrated framework spanning technical, linguistic, philosophical, and relational analyses—remains the field's most urgent need.

González Arocha's insight that prompting is "an inherently philosophical act" has implications that extend far beyond academic philosophy. Every prompt designer, every decomposition engine architect, every user who types a query into an LLM is engaging in a practice that carries epistemological, ethical, and relational weight. The question is whether we will design our systems—and our practices—with adequate awareness of that weight. The shift from instruction to inquiry is one step; the shift from unreflective to reflexive practice, grounded in relational accountability, is the larger journey that lies ahead.


Bibliography

Anthropic. (2025). Effective context engineering for AI agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge University Press.

Austin, J. L. (1962). How to do things with words. Oxford University Press.

Bakhtin, M. M. (1929/1963). Problems of Dostoevsky's poetics (C. Emerson, Trans.). University of Minnesota Press, 1984.

Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajber, J., Lehmann, T., Grundmann, M., Nyczyk, H., Schick, R., & Hoefler, T. (2024). Graph of Thoughts: Solving elaborate problems with large language models. Proceedings of AAAI 2024. https://arxiv.org/abs/2308.09687

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Proceedings of NeurIPS 2020. https://arxiv.org/abs/2005.14165

Buber, M. (1923). I and Thou (W. Kaufmann, Trans.). Scribner, 1970.

Bunt, H. (2009). The DIT++ taxonomy for functional dialogue acts. Proceedings of the EDAML 2009 Workshop.

Carroll, S. R., Garba, I., Figueroa-RodrĂ­guez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043

Chang, E. Y., et al. (2023). Prompting large language models with the Socratic method. IEEE Access, 11, 51156–51167. https://doi.org/10.1109/ACCESS.2023.3267890

Chen, W., Su, Y., Zuo, J., et al. (2023). AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors. https://arxiv.org/abs/2308.10848

Coeckelbergh, M. (2012). Growing moral relations: Critique of moral status ascription. Palgrave Macmillan.

Dennett, D. C. (1987). The intentional stance. MIT Press.

Djeffal, C. (2025). Reflexive prompt engineering: A framework for responsible prompt engineering and AI interaction design. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25). https://arxiv.org/abs/2504.16204

Dreyfus, H. L. (1972). What computers can't do: The limits of artificial intelligence. MIT Press.

Dreyfus, H. L. (1992). What computers still can't do: A critique of artificial reason. MIT Press.

Dreyfus, H. L. (2007). Why Heideggerian AI failed and how fixing it would require making it more Heideggerian. Philosophical Psychology, 20(2), 247–268.

Fernando, C., Banarse, D., Michalewski, H., Osindero, S., & Rocktäschel, T. (2024). Promptbreeder: Self-referential self-improvement via prompt evolution. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.16797

Ferrario, A., & Loi, M. (2026). Are large language models intentional? The limits of referential grounding. Philosophy & Technology, 39. https://doi.org/10.1007/s13347-026-01079-4

Floridi, L. (2023). The ethics of artificial intelligence: Principles, challenges, and opportunities. Oxford University Press.

Floridi, L. (2025). AI as agency without intelligence: On artificial intelligence as a new form of agency. Philosophy & Technology, 38. https://doi.org/10.1007/s13347-025-00858-9

Gadamer, H.-G. (1960). Truth and method (J. Weinsheimer & D. G. Marshall, Trans.). Continuum, 2004.

GonzĂĄlez Arocha, J. (2025). Critical phenomenology of prompting in artificial intelligence. Sophia, 39. https://doi.org/10.17163/soph.n39.2025.04

Gordon, J. (2024). Speech acts and large language models. PhilArchive. https://philarchive.org/archive/GORSAA-12v1

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). Academic Press.

Groenendijk, J., & Stokhof, M. (1984). Studies on the semantics of questions and the pragmatics of answers (Doctoral dissertation). University of Amsterdam.

Gubelmann, R. (2024). Large language models, agency, and why speech acts are beyond them (for now). Philosophy & Technology, 37, 45. https://doi.org/10.1007/s13347-024-00696-1

Guo, Q., Wang, R., Guo, J., Li, B., Song, K., Tan, X., Liu, G., Bian, J., & Yang, Y. (2024). EvoPrompt: Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.08532

Hamblin, C. L. (1973). Questions in Montague English. Foundations of Language, 10(1), 41–53.

Hasse, C. (2017). Rethinking the I-You relation through dialogical philosophy in the age of social robots. AI & Society, 32, 467–479. https://doi.org/10.1007/s00146-017-0703-x

Hobbs, J. R. (1979). Coherence and coreference. Cognitive Science, 3(1), 67–90.

Hong, S., Zhuge, M., Chen, J., et al. (2024). MetaGPT: Meta programming for a multi-agent collaborative framework. Proceedings of ICLR 2024. https://arxiv.org/abs/2308.00352

Hu, J., et al. (2025). Pragmatics in the era of large language models. https://arxiv.org/abs/2502.12378

Ihde, D. (1990). Technology and the lifeworld: From garden to earth. Indiana University Press.

Ivison, H., et al. (2024). From language modeling to instruction following: Understanding the behavior shift in LLMs after instruction tuning. Proceedings of NAACL 2024. https://aclanthology.org/2024.naacl-long.130/

Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Mober, H., et al. (2024a). DSPy: Compiling declarative language model calls into self-improving pipelines. Proceedings of ICLR 2024 (Spotlight). https://arxiv.org/abs/2310.03714

Khattab, O., et al. (2024b). Fine-tuning and prompt optimization: Two great steps that work better together. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.597.pdf

Khot, T., Trivedi, H., Finlayson, M., et al. (2023). Decomposed prompting: A modular approach for solving complex tasks. Proceedings of ICLR 2023. https://arxiv.org/abs/2210.02406

Krause, L., & Vossen, P. (2024). The Gricean Maxims in NLP—A survey. Proceedings of INLG 2024. https://aclanthology.org/2024.inlg-main.39/

Leidner, J. L., & Plachouras, V. (2023). The language of prompting: What linguistic properties make a prompt successful? Findings of EMNLP 2023. https://arxiv.org/abs/2311.01967

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Proceedings of EMNLP 2021.

Levinas, E. (1961). Totality and infinity: An essay on exteriority (A. Lingis, Trans.). Duquesne University Press, 1969.

Li, G., Hammoud, H., Itani, H., et al. (2023). CAMEL: Communicative agents for "mind" exploration of large language model society. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2303.17760

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of ACL 2021.

Li, et al. (2025). A survey of automatic prompt engineering: An optimization perspective. https://arxiv.org/abs/2502.11560

Ma, Y., et al. (2024). The death and life of great prompts: Analyzing the evolution of LLM prompts from the structural perspective. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.1227/

Mann, W. C., & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3), 243–281.

Markl, N. (2025). Taxonomizing representational harms using speech act theory. https://arxiv.org/abs/2504.00928

Press, O., et al. (2023). Measuring and narrowing the compositionality gap in language models. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.378/

Qian, C., Cong, X., Yang, C., et al. (2024). ChatDev: Communicative agents for software development. Proceedings of ACL 2024.

Ramnath, et al. (2025). A systematic survey of automatic prompt optimization techniques. Proceedings of EMNLP 2025. https://arxiv.org/abs/2502.16923

Roberts, C. (2012). Information structure in discourse: Towards an integrated formal theory of pragmatics. Semantics and Pragmatics, 5(6), 1–69.

Russo, F., Schliesser, E., & Wagemans, J. (2023). Connecting ethics and epistemology of AI. AI & Society, 38. https://doi.org/10.1007/s00146-022-01617-6

Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424.

Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. Proceedings of EMNLP 2020, 4222–4235. https://arxiv.org/abs/2010.15980

Wang, G., Xie, Y., Jiang, Y., et al. (2023). Voyager: An open-ended embodied agent with large language models. https://arxiv.org/abs/2305.16291

Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023a). Plan-and-Solve prompting: Improving zero-shot chain-of-thought reasoning. Proceedings of ACL 2023. https://arxiv.org/abs/2305.04091

Wang, X., Li, C., Wang, Z., Bai, F., Luo, H., Zhang, J., Jojic, N., Xing, E. P., & Hu, Z. (2024a). PromptAgent: Strategic planning with language models enables expert-level prompt optimization. Proceedings of ICLR 2024. https://arxiv.org/abs/2310.16427

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Proceedings of NeurIPS 2022. https://arxiv.org/abs/2201.11903

Wilson, S. (2008). Research is ceremony: Indigenous research methods. Fernwood Publishing.

Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell.

Wu, Q., Bansal, G., Zhang, J., et al. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. https://arxiv.org/abs/2308.08155

Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2024). Large language models as optimizers (OPRO). Proceedings of ICLR 2024. https://arxiv.org/abs/2309.03409

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate problem solving with large language models. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2305.10601

Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Huang, Z., Guestrin, C., & Zou, J. (2024/2025). TextGrad: Automatic "differentiation" via text. arXiv 2024; Nature 2025. https://arxiv.org/abs/2406.07496

Zeldes, A., et al. (2025). eRST: A signaled graph theory of discourse relations and organization. Computational Linguistics, 51(1), 23–72. https://doi.org/10.1162/coli_a_00538

Zhang, J., & Cao, Y. (2025). Why prompt design matters and works: A complexity analysis of prompt search space in LLMs. Proceedings of ACL 2025. https://aclanthology.org/2025.acl-long.1562/

Zhang, Y., Sreedharan, S., & Kambhampati, S. (2023b). Meta prompting for AI systems. https://arxiv.org/abs/2311.11482

Zhou, A., Yan, K., Shlapentokh-Rothman, M., et al. (2024). Language Agent Tree Search unifies reasoning, acting, and planning in language models. Proceedings of ICML 2024. https://arxiv.org/abs/2310.04406

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2023a). Least-to-Most prompting enables complex reasoning in large language models. Proceedings of ICLR 2023. https://arxiv.org/abs/2205.10625

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023b). Large language models are human-level prompt engineers (APE). Proceedings of ICLR 2023. https://arxiv.org/abs/2211.01910


This literature review was produced as part of the IAIP Polyphonic Discussion research protocol (RCH-CTX-Polyphonic-discussion--2604060040). It synthesises four parallel survey tracks: Automated Prompt Engineering Methods, Computational Linguistics, Philosophy of AI, and Prompt Decomposition Engines. April 6, 2026.