From Instruction to Inquiry: A Literature Review of Prompt Decomposition's Philosophical, Linguistic, and Technical Evolution

IAIP Polyphonic Research Context — Revised Version (v2) Date: April 2026

Abstract

The evolution of prompt engineering from static imperative commands to dynamic, inquiry-based conversational systems constitutes an undertheorised development in contemporary artificial intelligence. This literature review examines this trajectory—what we term the "interrogative turn"—across four intersecting domains: technical prompt engineering and automated prompt optimisation, computational linguistics applied to human-LLM interaction, the philosophy of AI, and Indigenous relational epistemology. Drawing on González Arocha's (2025) critical phenomenology of prompting and Krause and Vossen's (2024) pragmatic mapping as dual focal lenses, we argue that this emerging trajectory, while not yet empirically established as reliably superior to structured approaches, raises philosophically significant questions about the human-AI relationship with epistemological, ontological, and ethical dimensions. The technical literature documents a chronological progression from manual templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023b), decomposed prompting (Khot et al., 2023), and programmatic compilation (Khattab et al., 2023) toward early conversational decomposition systems (2025–2026). Computational linguistics reveals this progression as a shift in illocutionary force from directives to questions, with consequences analysable through speech act theory (Gordon, 2024), Gricean pragmatics (Krause & Vossen, 2024), relevance theory (Sperber & Wilson, 1986/1995), and formal question semantics (Hamblin, 1973; Groenendijk & Stokhof, 1984). Cognitive science frameworks—including dual process theory (Kahneman, 2011), cognitive load theory (Sweller, 1988), and bounded rationality (Simon, 1956)—illuminate why decomposition aids both human and machine reasoning. The philosophical literature—particularly González Arocha's phenomenological analysis, the "stochastic parrot" critique (Bender et al., 2021), and Suchman's (2007) situated action framework—reveals complex questions about whether the shift constitutes a reconfiguration of human-AI relations or a change in surface linguistic form with limited computational consequence. Indigenous relational epistemology (Wilson, 2008; Smith, 2021; Kovach, 2021; Little Bear, 2000) and Indigenous AI initiatives (Lewis et al., 2020; the FLAIR and Abundant Intelligences programmes) challenge the extractive assumptions embedded in both imperative and interrogative paradigms while cautioning against the co-optation of relational frameworks by corporate AI. We identify critical convergences and gaps across these domains, present honest counter-evidence against the central thesis, and propose original research questions suitable for a graduate-level interdisciplinary research programme. The interrogative turn thesis is best understood as identifying an emerging trajectory and proposing a normative framework, not as documenting an accomplished paradigm shift.

1. Introduction: The Problem Space

When a user types "Summarise this article" into a large language model interface, they perform an act that appears straightforwardly technical: issuing a command to a computational system. When the same user instead types "What are the key arguments this article makes, and where might they be vulnerable?", something potentially different occurs—not merely in the expected output, but in the communicative, epistemological, and phenomenological character of the interaction itself. The first prompt commands execution; the second invites inquiry. Whether this distinction, which might seem trivial from an engineering perspective, carries genuine computational and epistemic consequences is the central question this literature review investigates.

This literature review examines the evolution of prompt decomposition—the process by which complex tasks are broken into manageable sub-tasks for large language models—from instruction-based to inquiry-based paradigms. The interrogative turn thesis proposes that this evolution, traced across technical, linguistic, philosophical, and Indigenous epistemological literatures, may reveal a transformation not merely in how humans communicate with AI systems, but in what kind of epistemic, ethical, and relational practices those communications constitute. We present this as a thesis to be examined, not an accomplished fact.

The problem space is inherently interdisciplinary. Technical researchers have documented the progression from static prompt templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023b), and decomposed prompting (Khot et al., 2023) toward conversational agentic systems that decompose tasks through dialogue (Li et al., 2023; Wu et al., 2023). The instruction-following capabilities that make these systems possible rest on reinforcement learning from human feedback (RLHF; Ouyang et al., 2022), which trained models to respond to both commands and questions in ways that align with human preferences. Computational linguists have begun analysing prompts as discourse units with rhetorical structure (Mann & Thompson, 1988; Zeldes et al., 2025), speech act properties (Gordon, 2024), and pragmatic dimensions (Krause & Vossen, 2024). Philosophers have recognised that the mode of prompting may reconfigure the human-technology relation (Ihde, 1990; González Arocha, 2025) and raises questions about epistemic agency (Floridi, 2025), genuine dialogue (Buber, 1923; Bakhtin, 1929/1963), and the ethics of AI interaction design (Djeffal, 2025). At the same time, the "stochastic parrot" critique (Bender et al., 2021) and analyses of LLM metaphors (Shanahan, 2024) caution against attributing communicative or cognitive capacities to systems that generate text through statistical pattern-matching. Indigenous relational epistemology (Wilson, 2008; Smith, 2021; Kovach, 2021) offers alternative frameworks for understanding knowledge production that challenge the extractive assumptions embedded in current AI systems.

We select González Arocha (2025) as the primary focal lens with acknowledged limitations. Published in Sophia (Universidad Politécnica Salesiana, Ecuador), a regional journal with limited international visibility, the work has not yet accumulated citations or responses. We select it not because of its impact but because it is, to our knowledge, the only published work that treats prompting as an inherently philosophical practice rather than a technical activity to which philosophical reflection is subsequently applied. We complement González Arocha with Krause and Vossen (2024) as an empirical focal work, published at INLG 2024, providing the definitive current mapping between pragmatic theory and NLP practice. Together, these works anchor the review's dual commitment to philosophical depth and empirical grounding. We recognise that this dual focal choice privileges particular interpretive angles—phenomenological and pragmatic—and that alternative focal works (Djeffal, 2025; Press et al., 2023; Khot et al., 2023) would have produced a differently inflected review.

1.1 Search Methodology

This review draws on four parallel disciplinary survey tracks conducted using a multi-agent research protocol. Each track searched Semantic Scholar, Google Scholar, ACM Digital Library, PhilPapers, arXiv, and DBLP using discipline-specific search terms. Technical terms included "prompt engineering," "automated prompt optimisation," "task decomposition LLM," and "multi-agent systems." Linguistic terms included "speech acts AND language models," "Gricean pragmatics AND NLP," "discourse structure AND prompting." Philosophical terms included "philosophy of prompting," "AI epistemology," "phenomenology AND artificial intelligence." Indigenous and decolonial terms included "Indigenous AI," "decolonial AI," "relational epistemology AND technology." Inclusion criteria required English-language publication between 2020–2026, though foundational works from earlier periods were included where essential. Grey literature (corporate blogs, technical guides) was included but is explicitly flagged throughout as non-peer-reviewed. Initial screening identified approximately 300 candidate sources; 120+ were retained after relevance and quality assessment. We acknowledge that this search strategy, while broad, cannot claim systematic review comprehensiveness and may reflect the biases of its disciplinary framing. The multi-agent protocol—in which separate agents produced independent surveys subsequently synthesised—risks echo-chamber effects; a future iteration should include a dedicated counter-thesis agent.

The review is structured to build its argument cumulatively while maintaining epistemic honesty about the strength of its claims. Section 2.1 establishes the technical foundations, tracing the chronological evolution of prompt decomposition methods. Section 2.2 analyses the linguistic and cognitive dimensions of this evolution. Section 2.3 develops the philosophical analysis. Section 2.4 introduces Indigenous relational epistemology as a framework that recontextualises the entire narrative. Section 2.5 identifies convergences and gaps across all domains. Section 2.6 presents counter-evidence and boundary conditions for the thesis. Section 3 proposes original research questions. Section 4 concludes with a reframed assessment of the interrogative turn thesis.

2. Literature Review

2.1 Technical Foundations: The Evolution of Prompt Decomposition

The technical literature documents a rapid evolution through at least five paradigmatic phases, each progressively moving toward more dynamic, adaptive, and—in some cases—conversational modes of interaction between humans, prompts, and models. The entire trajectory rests on a foundational technical development: reinforcement learning from human feedback (RLHF). Ouyang et al. (2022) demonstrated that fine-tuning language models with human feedback—through the InstructGPT methodology—transformed base models into systems capable of following both imperative and interrogative instructions. Without RLHF and its successors (Christiano et al., 2017; Ziegler et al., 2019), the instruction-following capabilities that make prompt engineering possible would not exist. This technical substrate is often taken for granted in discussions of prompting but is foundational: the "interrogative turn" presupposes models that can respond appropriately to questions, a capability that is itself an engineering achievement.

Phase 1: Static Templates and Few-Shot Learning (Pre-2022). The earliest prompt engineering relied on hand-crafted templates and few-shot examples (Brown et al., 2020). AutoPrompt (Shin et al., 2020) introduced gradient-guided discrete token search, demonstrating that prompts could be computationally optimised, but the resulting prompts were fixed artefacts deployed without adaptation. Prefix tuning (Li & Liang, 2021) and prompt tuning (Lester et al., 2021) introduced continuous prompt embeddings, moving beyond discrete tokens but sacrificing interpretability. Throughout this phase, prompts functioned as static parameters—set once, used repeatedly.

Phase 2: Structured Reasoning Chains (2022–2023). Chain-of-thought (CoT) prompting (Wei et al., 2022) represented a pivotal development: the demonstration that prompt structure, not merely content, could fundamentally alter model capability. By including intermediate reasoning steps, CoT enabled models to solve complex problems previously beyond their reach, achieving state-of-the-art results on GSM8K with PaLM 540B. Linguistically, CoT transforms implicit inferential processes into explicit discourse—a point to which we return in Section 2.2. Least-to-Most prompting (Zhou et al., 2023a) extended this by decomposing problems from simplest to most complex subproblems, achieving 99% accuracy on SCAN compositional generalisation versus 16% for standard CoT. Plan-and-Solve prompting (Wang et al., 2023a) introduced explicit planning phases before execution. These methods established that decomposition—breaking complex tasks into structured sub-tasks—dramatically improves LLM performance.

Phase 3: Automated Prompt Optimisation (2023–2024). The year 2023 witnessed an explosion of automated prompt engineering (APE) methods. The foundational APE paper (Zhou et al., 2023b) demonstrated that LLMs could generate and select prompts matching or exceeding human performance, establishing the generate-evaluate-select pipeline. OPRO (Yang et al., 2024) used meta-prompts with trajectory histories, achieving 8–50% improvements over human prompts. PromptBreeder (Fernando et al., 2024) introduced self-referential evolution, where both task-prompts and mutation-prompts evolved simultaneously—the system learning to improve its own improvement process. EvoPrompt (Guo et al., 2024) applied genetic algorithms and differential evolution. PromptAgent (Wang et al., 2024a) employed Monte Carlo Tree Search with error-reflection loops that represent arguably the closest existing approach to conversational prompt optimisation, as the agent iteratively adjusts based on identified failures.

DSPy (Khattab et al., 2023, 2024a) represents a significant shift within this phase: the transition from "prompting" to "programming." By treating prompt engineering as compilation of declarative Python programs, DSPy achieves substantial accuracy improvements over manual prompting and decouples prompt quality from specific model versions. The BetterTogether variant (Khattab et al., 2024b) combines prompt optimisation with weight fine-tuning for additional gains. However, DSPy's optimisation occurs at compile-time, not run-time—a limitation for the conversational paradigm, but one that reflects the field's current performance frontier.

Phase 4: Reasoning Topologies and Dynamic Context (2024–2025). The progression from Chain-of-Thought to Tree of Thoughts (ToT; Yao et al., 2023) to Graph of Thoughts (GoT; Besta et al., 2024) represents an evolution from linear to branching to graph-structured reasoning. GoT achieves higher quality than ToT at lower cost by supporting aggregation, refinement, cycles, and feedback loops—structural features that more closely resemble conversation than linear execution. TextGrad (Yuksekgonul et al., 2024/2025), published in Nature, extended automatic differentiation to text, enabling backpropagation-like optimisation through natural-language feedback—an iterative mechanism where the system and its evaluator engage in critique and revision, though whether this constitutes genuine dialogue or optimisation dressed in natural language is precisely the question at issue. GEPA (Databricks/UC Berkeley, 2025) [preprint] demonstrated that evolutionary prompt optimisation with LLM-driven reflection can achieve strong performance with open-source models at substantially lower cost than proprietary alternatives.

Concurrently, Anthropic's (2025) "context engineering" paradigm [blog post; non-peer-reviewed] argued that the optimisation target should be the entire context window—not just prompt text but dynamically selected conversation history, retrieved documents, tool outputs, and system state. While this represents the field's explicit recognition that prompts exist within dynamic conversational contexts rather than as isolated instructions, its status as corporate technical guidance rather than peer-reviewed research means its specific claims should be treated as practitioner insight rather than empirical findings.

Phase 5: Conversational Decomposition (2025–2026). The most recent phase witnesses the emergence of systems that decompose tasks through dialogue. Decomposed Prompting (Khot et al., 2023) established the theoretical foundation by demonstrating that modular decomposition—where complex tasks are broken into sub-tasks handled by specialised prompts—consistently outperforms monolithic strategies. Building on this foundation, multi-agent architectures introduced decomposition through inter-agent conversation: CAMEL (Li et al., 2023) pioneered "inception prompting" where agents prompt each other in role-play dialogues; AutoGen (Wu et al., 2023) adopted a conversation-centric architecture where decomposition emerges from asynchronous message-passing; ChatDev (Qian et al., 2024) introduced "communicative dehallucination" through clarification dialogues.

By 2025, early evidence suggests that inquiry-based systems are beginning to emerge: ACT (Google, 2025) [preprint] trains agents to ask clarifying questions during task dialogue; FATA (2025) [preprint] generates comprehensive clarification checklists before answering; the Tri-Agent Evaluation Framework (KDD 2025) [workshop paper] measures decomposition quality through dialogue quality. These systems represent what may be an early shift in the locus of agency: from human-designed decomposition executed by machines, to machine-initiated decomposition refined through dialogue. However, as Section 2.6 discusses, this evidence is preliminary, unreplicated, and must be weighed against the continued dominance of structured imperative approaches on standard benchmarks.

A gap persists throughout this evolution: no published APE method yet treats the optimisation loop as itself a conversation. All current methods are monologic—the optimiser generates prompts for a passive receiver. Whether this gap represents a genuine limitation or simply reflects the fact that structured optimisation is more effective than conversational negotiation remains an open empirical question.

2.2 Computational Linguistics: Prompts as Discourse

The computational linguistics literature reveals that the technical evolution documented above corresponds to a linguistically analysable transformation—what we term the "interrogative turn" in prompt engineering. This turn is analysable through multiple established linguistic frameworks, each illuminating a different dimension of its potential significance. However, following Leidner and Plachouras (2023), who established empirically that neither naturalness nor lower perplexity reliably predicts prompt effectiveness, we note at the outset that the relationship between linguistic form and output quality is task- and model-dependent. The linguistic analysis that follows identifies structural properties of prompts, not guaranteed performance outcomes.

Rhetorical Structure Theory and Prompt Architecture. Mann and Thompson's (1988) Rhetorical Structure Theory (RST) provides a framework for analysing prompts as hierarchically organised discourse. The core instruction functions as the nucleus; contextual elements—role specifications, constraints, examples—serve as satellites connected by coherence relations (Elaboration, Background, Condition). Zeldes et al.'s (2025) Enhanced RST (eRST) extends this to graph-based representations supporting non-projective and concurrent discourse relations, directly applicable to multi-component and multi-turn prompts where relations cross turn boundaries. Decomposing a complex prompt can be understood, in RST terms, as flattening a deep rhetorical tree into a sequence of simpler nucleus-satellite pairs—each expressible as a single coherent instruction or question.

Speech Act Theory and the Shift in Illocutionary Force. Gordon's (2024) "Speech Acts and Large Language Models" provides the most systematic application of Austin-Searle speech act theory to LLM interaction. Gordon introduces the concept of "conversational zombies"—entities that produce utterances with perlocutionary effects (persuading, informing) while lacking the intentionality required for genuine illocutionary force. This analysis reveals that the imperative-to-interrogative shift in prompting corresponds to a shift in illocutionary force from directives (commands whose point is getting the addressee to do something) to questions (requests for information that open a set of acceptable responses). The preparatory conditions change: directives presuppose the hearer can perform the action and the speaker has authority; questions presuppose a knowledge asymmetry and the existence of an answer. Gubelmann (2024) deepens this from a Kantian-pragmatist perspective, arguing that LLMs cannot perform genuine speech acts due to lacking autonomous agency, while Markl (2025) demonstrates that speech act theory productively taxonomises representational harms in LLM output as perlocutionary effects without corresponding illocutionary intention.

For prompt decomposition specifically, the interrogative turn has structural consequences. Imperative decomposition yields a sequence of sub-commands (do A, then B, then C). Interrogative decomposition yields a tree of sub-questions whose answers compose hierarchically—a structure directly modelled by formal question semantics.

Formal Question Semantics and the Architecture of Inquiry. Hamblin's (1973) treatment of questions as denoting sets of possible answers, and Groenendijk and Stokhof's (1984) partition semantics, provide the formal machinery for understanding what interrogative prompts mean in ways imperative semantics cannot capture. When a prompt shifts from "Summarise this text" to "What are the key points of this text?", the semantic object changes from a command with a single expected execution to a question with a structured set of acceptable answers. The interrogative mode thus provides a richer semantic framework for structured reasoning—and directly mirrors the tree-search structures that systems like ToT (Yao et al., 2023) and GoT (Besta et al., 2024) implement computationally.

The self-ask method (Press et al., 2023) makes this connection explicit: the model generates and answers its own follow-up sub-questions, outperforming imperative linear reasoning. Press et al. identify the "compositionality gap"—LLMs can answer sub-questions correctly but fail to compose them—demonstrating that interrogative self-decomposition addresses a specific structural limitation, not merely a stylistic preference.

Gricean Pragmatics and Cooperative Prompting. Krause and Vossen's (2024) comprehensive survey maps Gricean maxims to NLP tasks, establishing the definitive current bridge between pragmatic theory and NLP practice. Their work reveals systematic patterns of maxim violation in LLM interaction: Quantity violations (over- or under-informing), Quality violations (hallucination as functional falsehood), Relation violations (irrelevant tangents), and Manner violations (ambiguous or disorganised output). Prompt engineering strategies—specifying output length, requesting citations, demanding structured formatting—are, in Gricean terms, explicit reinforcements of cooperative maxims.

The interrogative turn adds a pragmatic dimension: questions trigger different implicature patterns than commands. Questions carry the implicature that the questioner does not know the answer, licensing the respondent to provide information the questioner lacks. This framing positions human-LLM interaction as knowledge-sharing dialogue rather than master-servant execution—a pragmatic shift with potential consequences that extend beyond mere output quality. However, whether LLMs can genuinely participate in Gricean cooperation—which requires shared communicative intentions—is itself contested (see Section 2.3 on Bender et al., 2021).

Relevance Theory as an Alternative Framework. Sperber and Wilson's (1986/1995) Relevance Theory offers an alternative to Gricean pragmatics that may be more applicable to LLM interaction. Where Grice's model posits multiple cooperative maxims, Relevance Theory reduces communication to a single principle: utterances are presumed to be optimally relevant—achieving the greatest cognitive effect for the least processing effort. This framework has two advantages for prompt analysis. First, it provides a cognitive rather than social model of communication, potentially better suited to human-machine interaction where the social dimensions of Gricean cooperation are attenuated. Second, it offers a graduated metric: prompts can be assessed in terms of the relevance they achieve (how much useful information they elicit relative to the processing effort required to formulate them). An imperative prompt like "Summarise X" achieves narrow relevance (one specific output); an interrogative prompt like "What is important about X?" achieves broader relevance (a wider set of potentially useful outputs) at the cost of less predictable formatting. Relevance Theory thus formalises the trade-off between control and epistemic richness that is central to the interrogative turn thesis.

Information-Theoretic Foundations. Zhang and Cao's (2025) analysis at ACL 2025 demonstrates that prompts function as information selectors, determining which slice of the model's internal representation gets verbalised at each reasoning step. The prompt search space grows combinatorially, and task-specific prompts dramatically outperform generic ones by efficiently routing information extraction. This provides formal grounding for why linguistic choices in prompt design matter: each word shapes an information extraction pathway. Combined with Ivison et al.'s (2024) mechanistic analysis showing that instruction tuning shifts model attention to instruction verbs and semantic structure, we see that the linguistic form of prompts has measurable computational consequences—a finding that bridges the technical and linguistic literatures. However, the extent to which syntactic form (imperative vs. interrogative) rather than semantic content drives these effects remains to be empirically determined.

Worked Example: Linguistic Analysis of a Prompt Pair. To demonstrate that these frameworks are analytically productive and not merely decorative, we annotate a concrete prompt pair:

Imperative prompt: "Summarise the key findings of Smith et al. (2024) in three bullet points, focusing on methodology and results."

RST analysis: The nucleus is "Summarise the key findings"; satellites include "of Smith et al. (2024)" (Elaboration), "in three bullet points" (Manner), "focusing on methodology and results" (Condition).
Speech act: Directive (Searle, 1969). Illocutionary force: command. Preparatory condition: the model can produce the summary. Sincerity: the user wants it done. Direction of fit: world-to-words.
Gricean analysis: Quantity is explicitly constrained (three points); Relation is narrowed (methodology and results); Manner is specified (bullet format). The prompt pre-encodes cooperative maxim compliance, leaving minimal interpretive latitude.

Interrogative prompt: "What are the most important findings from Smith et al. (2024), and how robust is their methodology?"

RST analysis: Two coordinated nuclei linked by a Joint relation; the paper itself is an implicit Background satellite.
Speech act: Dual question. Illocutionary force: request for information. Preparatory condition: the model has relevant knowledge. Direction of fit: words-to-world. Creates an open answer set.
Gricean analysis: Quantity, Relation, and Manner are left to the respondent's judgement. The prompt relies on implicature rather than explicit constraint.
Question semantics (Hamblin, 1973): The first question denotes the set of propositions identifying important findings; the second denotes the set of propositions evaluating methodological robustness. Together they create a partition requiring evaluative and analytical sub-answers.

The imperative prompt produces a constrained, predictable discourse structure; the interrogative prompt produces a richer semantic space requiring judgement, evaluation, and potential follow-up—at the cost of less predictable output format. This trade-off is central to the interrogative turn thesis: interrogative prompts may sacrifice control for epistemic richness. Whether this trade-off is advantageous depends on the task, the user's needs, and the model's capabilities.

Empirical Prompt Linguistics. Leidner and Plachouras (2023) established empirically that neither naturalness nor lower perplexity reliably predicts prompt effectiveness—the relationship between linguistic form and output quality is task- and model-dependent. This finding is a crucial boundary condition for the interrogative turn thesis: even if interrogative prompts have richer semantic structure, they may not reliably produce better outputs. Ma et al.'s (2024) large-scale analysis of 10,538 real-world prompts identified eight structural components and tracked their evolution, finding that "Capability" and "Demonstration" components outperform simple "Role" specifications. Hu et al. (2025) demonstrate through benchmark evaluation that LLMs excel at semantic tasks but systematically underperform on pragmatic phenomena including conversational implicature and presupposition accommodation—a gap with direct implications for the effectiveness of interrogative prompting strategies.

A critical gap in the linguistic literature concerns cross-linguistic analysis: virtually all prompt linguistics has been conducted on English-language prompts and English-centric models. How prompting strategies interact with typologically diverse languages—with different question formation strategies, honorific systems, and discourse organisation principles—remains virtually unstudied.

2.2.5 Cognitive Dimensions of Decomposition

The cognitive science literature, while not typically engaged in prompt engineering research, provides essential frameworks for understanding why decomposition helps both human users and language models.

Dual Process Theory. Kahneman's (2011) distinction between System 1 (fast, intuitive, heuristic) and System 2 (slow, deliberative, analytical) processing illuminates a core dynamic of prompt decomposition. LLMs, trained on next-token prediction, exhibit something analogous to System 1 processing: rapid pattern-matching that produces fluent but sometimes unreliable outputs. Chain-of-thought prompting and task decomposition can be understood as externalisations of System 2 deliberation—forcing the model to make intermediate reasoning steps explicit rather than leaping directly to conclusions. Wei et al.'s (2022) finding that CoT dramatically improves performance on multi-step reasoning tasks is consistent with this interpretation: the prompt structure compensates for the model's tendency toward heuristic pattern-completion by imposing deliberative structure. This parallel should not be overstated—LLMs do not have cognitive systems in Kahneman's sense—but it provides a useful functional analogy for understanding why structured prompting helps.

Cognitive Load Theory. Sweller's (1988; Sweller et al., 2011) cognitive load theory distinguishes three types of load: intrinsic (inherent to the material), extraneous (imposed by poor instructional design), and germane (invested in schema construction). For human users formulating prompts, complex multi-requirement prompts impose high extraneous cognitive load: the user must simultaneously manage content specification, format requirements, constraint articulation, and quality criteria. Decomposition reduces extraneous load by distributing these demands across multiple simpler interactions. Conversational decomposition may further reduce load by offloading the decomposition itself to the dialogue process—the model helps the user determine what needs to be asked. However, conversational decomposition introduces its own cognitive demands (maintaining conversational coherence across turns, evaluating intermediate outputs), which may impose different forms of load that have not yet been empirically measured.

Bounded Rationality and Satisficing. Simon's (1956) bounded rationality framework suggests that decomposition serves a satisficing function: rather than attempting to optimise a complex task holistically (which exceeds both human and LLM processing capacity), decomposition produces adequate sub-solutions that, when composed, yield acceptable overall results. This perspective reframes decomposition not as a path to optimal performance but as a practical accommodation to cognitive and computational limitations. The interrogative turn can be understood within this framework as a shift from attempting to specify the complete solution space in advance (imperative, optimising) to iteratively narrowing the solution space through dialogue (interrogative, satisficing).

Formal and Functional Competence Dissociation. Mahowald et al. (2024), in "Dissociating Language and Thought in Large Language Models" (Trends in Cognitive Sciences), argue for a crucial distinction between formal linguistic competence (the ability to produce grammatical, fluent, contextually appropriate language) and functional competence (the ability to reason, plan, and solve problems). LLMs exhibit striking formal competence—they produce remarkably fluent text—but their functional competence is unreliable, particularly for multi-step reasoning, causal inference, and novel problem-solving. This dissociation is directly relevant to the interrogative turn thesis because it explains why decomposition helps: by breaking complex reasoning tasks into steps, decomposition externalises the functional reasoning that LLMs perform unreliably, leveraging their formal competence (producing coherent sub-answers) while compensating for their functional limitations (composing those sub-answers into valid reasoning chains). The interrogative mode may be particularly suited to this compensation because questions naturally partition the reasoning space into sub-problems.

2.3 Philosophy of AI: The Meaning of the Shift

The philosophical literature reveals that the instruction-to-inquiry shift, if it proves substantive, raises questions with epistemological, ontological, ethical, and phenomenological dimensions. González Arocha's (2025) critical phenomenology of prompting provides an integrating framework, but must be read alongside the "stochastic parrot" critique (Bender et al., 2021) and analyses of LLM metaphors (Shanahan, 2024), which challenge the attributions of communicative capacity on which some of the strongest philosophical claims rest.

The "Stochastic Parrot" Critique and Its Implications. The "stochastic parrot" critique (Bender et al., 2021) poses the most fundamental challenge to the interrogative turn thesis. If LLMs generate text by statistical pattern-matching without understanding—if they are, in Bender et al.'s formulation, "stochastic parrots"—then attributing communicative capacity, dialogical engagement, or inquiry to them is potentially misleading. Shanahan (2024), in "Talking About Large Language Models" (Communications of the ACM), reinforces this, arguing that we must be precise about the metaphors we use for LLMs and resist anthropomorphising their outputs. When we say an LLM "answers a question" or "engages in dialogue," we use language that implies cognitive capacities the system may not possess. The interrogative turn thesis must therefore be interpreted primarily as a claim about the human side of interaction: how framing engagement as inquiry shapes human epistemic agency, critical engagement, and interpretive authority—rather than as a claim about machine communicative capacity. The technical benefits of interrogative prompting (wider answer sets, compositional structure, richer semantic frameworks) are information-theoretic properties of questions as linguistic objects, not evidence of machine understanding. This distinction—between the structure of interaction and the cognitive capacities of the interactants—is essential for maintaining philosophical honesty about what the interrogative turn does and does not entail.

González Arocha's Critical Phenomenology of Prompting. Published in Sophia in 2025, González Arocha's "Critical Phenomenology of Prompting in Artificial Intelligence" constitutes a direct philosophical treatment of prompting as a philosophical practice. González Arocha argues that prompts are not neutral technical instructions but discursive practices that embed assumptions, worldviews, and power relations. The prompt functions as a "mediating space" where human intentionality, language, and sociopolitical structures converge. González Arocha describes prompt design as "an inherently philosophical act" (we note that this is one philosopher's interpretive claim, not an established consensus, though we find it productive).

González Arocha's analysis operates at multiple levels. First, at the phenomenological level, he argues that the mode of prompting (imperative versus interrogative) reconfigures the human-AI relationship: commands produce a tool-relation; questions produce something approaching an alterity-relation. Second, at the critical level, he reveals that prompt design is a site of power—who designs prompts, whose assumptions are encoded, and whose epistemologies are privileged are not merely technical questions but questions of justice. Third, at the epistemological level, he argues that the kind of knowledge produced through AI interaction depends on the communicative structure of the prompt: extractive commands yield extracted information; genuine questions yield something closer to collaborative understanding. Read alongside Bender et al.'s critique, González Arocha's analysis is most compelling as a phenomenology of human experience during AI interaction—how the human user's epistemic stance shifts when asking questions versus issuing commands—rather than as a claim about reciprocal machine engagement.

Suchman's Situated Action and the Plan-Execute Critique. Suchman's (2007) Human-Machine Reconfigurations provides a foundational challenge to the decomposition paradigm itself. Suchman argues that plans are not the causes of action but post-hoc rationalisations of situated behaviour: humans do not typically formulate complete plans and then execute them but instead act in response to unfolding circumstances, using plans as loose resources rather than rigid scripts. This critique applies directly to prompt decomposition: the "decompose-then-execute" model treats complex tasks as plannable in advance, when in practice the appropriate decomposition often emerges only through interaction with the task itself. Conversational decomposition, in which the decomposition unfolds through dialogue, may be better aligned with Suchman's situated action framework—but this alignment is speculative and requires empirical investigation.

Wittgensteinian Language Games. The later Wittgenstein's (1953) concept of language games provides a framework for understanding the shift. Prompts are moves within language games—rule-governed, context-dependent communicative practices embedded in "forms of life." The shift from imperative to interrogative prompting is a shift between language games with different rules: in the command game, success equals accurate execution and the AI is an instrument; in the inquiry game, success equals productive dialogue and the AI is a respondent. Recent work applying Wittgenstein to AI (Jolma, 2024; STRV, 2024 [industry analysis; non-peer-reviewed]) suggests that LLMs participate in these games statistically but lack the shared form of life that grounds genuine meaning. The philosophical significance lies in the game-shift being constitutive: it may create a different kind of interaction, not merely a different output.

The Socratic Tradition. The operationalisation of Socratic methods in LLM interaction (Chang et al., 2023; Princeton NLP SocraticAI, 2024) reveals a productive tension. Socratic questioning presupposes a co-inquirer capable of genuine aporia—perplexity, the recognition of one's own ignorance that drives the search for knowledge. LLMs can simulate the role of co-inquirer but cannot experience the aporia that makes Socratic questioning transformative. Yet the structure may matter: when humans adopt a Socratic stance toward AI—asking probing questions, identifying contradictions, pursuing implications—they position themselves as active epistemic agents rather than passive consumers. The Socratic tradition thus provides philosophical justification for the inquiry paradigm primarily as a framework for human epistemic practice, even as it reveals the limits of machine participation.

Searle, Dennett, and the Question of Understanding. Searle's (1980) Chinese Room argument has been reinvigorated by the LLM era (Ferrario & Loi, 2026). LLMs remain, by Searle's criteria, on the syntactic side of the divide—sophisticated symbol manipulation without semantic comprehension. Dennett's (1987) intentional stance offers a pragmatic counterpoint: treating LLMs "as if" they have beliefs is legitimate when their behaviour is best predicted by doing so. The instruction-to-inquiry shift sits at this philosophical junction. When we ask a question of an LLM, we implicitly adopt the intentional stance—treating the machine as an entity capable of responsive dialogue—while the Searlean critique reminds us this attribution is observer-relative, not intrinsic. Task decomposition itself, as Dreyfus (1972, 1992, 2007) would argue, accommodates the machine's lack of holistic understanding by breaking meaning into syntactically manageable chunks.

Floridi's Epistemic Agency. Floridi's (2025) distinction between "agency without intelligence" and genuine epistemic agency provides a rigorous epistemological framework. AI systems participate in knowledge-production processes but lack epistemic responsibility—they are agents in the infosphere without being knowers. The inquiry paradigm may strengthen human epistemic agency: by asking questions rather than issuing commands, the human retains interpretive authority over the AI's outputs. This aligns with Russo, Schliesser, and Wagemans' (2023) argument for an integrated "epistemology-cum-ethics" of AI, where the process of knowledge production—not just its outputs—carries ethical weight.

Postphenomenology and the Alterity Relation. Ihde's (1990) taxonomy of human-technology relations—embodiment, hermeneutic, alterity, and background—provides the conceptual vocabulary for analysing how different prompting modes may produce different phenomenological relations to AI. Under imperative prompting, AI occupies a hermeneutic or embodiment relation (tool-like). Under interrogative prompting, AI may shift toward an alterity relation—appearing as a quasi-other. TU Delft postphenomenological analysis (2024) suggests that ChatGPT disrupts standard categories by functioning simultaneously as hermeneutic agent and alterity. González Arocha (2025) builds on this, arguing that the mode of prompting determines the character of the "mediating space" and therefore the character of the knowledge, meaning, and experience it produces. However, whether this phenomenological shift in user experience corresponds to any genuine change in the system's properties—or is entirely on the human side—remains an open question best addressed through empirical phenomenological research (see RQ7).

Dialogical Philosophy and Bakhtinian Critique. Buber's (1923) I-Thou/I-It framework has been applied to AI interaction by several scholars. Hasse (2017) examined dialogical philosophy in the age of social robots, arguing that Buber's categories illuminate the phenomenological character of human-machine relations without requiring that machines be genuine "Thous." Aguas (2025) extends this analysis, while Ziderman's (2024) treatment of Martin Buber's Dialogical Thought as a Philosophy of Action provides a contemporary philosophical framework for understanding how dialogical structure shapes the character of encounters, even when one party lacks consciousness. Conversational prompting may create conditions that approach I-Thou dynamics—not because the machine becomes a genuine Thou, but because the human's orientation shifts toward openness and mutuality. Levinas's ethics of alterity raises the question of whether this "as-if" dialogue carries moral weight.

Bakhtin's (1929/1963) concept of polyphony introduces a critical counter-argument: genuine polyphony requires irreducible, autonomous consciousnesses in dialogue. LLM "multi-voice" outputs are what we might call "algorithmic monologism"—the appearance of multiple voices produced by a single optimising mechanism (see SciELO, 2025 [blog post; non-peer-reviewed] for a Bakhtinian analysis of AI discourse). This critique applies equally to multi-agent systems: CAMEL's role-playing agents (Li et al., 2023) produce the form of dialogue without the substance of genuine otherness. Whether structural dialogue, even without genuine polyphony, has epistemic value for the human participant is an empirical question this review cannot settle.

Djeffal's Reflexive Prompt Engineering. Djeffal's (2025) framework, presented at FAccT 2025, bridges philosophical theory and practical implementation. His five-component framework—prompt design, system selection, system configuration, performance evaluation, and prompt management—is grounded in the principle of "responsibility by design." Djeffal demonstrates that prompt engineering must incorporate ongoing ethical reflection ("reflexivity"), making it a continuous ethical practice rather than a one-time technical task. The framework operationalises González Arocha's philosophical insights: if prompts are mediating spaces with ethical weight, then designing prompts requires the kind of ongoing critical reflection that Djeffal's framework provides.

2.4 Indigenous Epistemology: Relational Alternatives and Decolonial Challenges

The instruction-to-inquiry shift, when viewed through Indigenous relational epistemology, raises a set of questions that the Western philosophical tradition alone cannot address. However, engaging Indigenous epistemology in this context requires careful attention to the risks of instrumentalisation, the qualitative differences between Indigenous relational thought and "relational" AI interaction, and the material conditions of AI production that may be fundamentally incompatible with Indigenous values.

Smith's Decolonising Methodologies. Linda Tuhiwai Smith's (2021) Decolonizing Methodologies (3rd ed., Zed Books) argues that Western research has historically been complicit in colonialism, treating Indigenous knowledge as raw material for extraction. Smith calls for reclaiming Indigenous intellectual traditions and protocols as foundational to knowledge production, not supplementary to it. Her twenty-five decolonizing research projects provide practical frameworks for non-extractive research characterised by community ownership, relational accountability, and respect for Indigenous intellectual sovereignty. For prompt decomposition, Smith's critique illuminates how current systems embed Western analytical assumptions—decompose, solve, recompose—that may be fundamentally incompatible with holistic knowledge systems. The very metaphor of "decomposition"—breaking wholes into parts for separate processing—encodes a particular epistemological commitment to analysis over synthesis, parts over wholes, that is not universal.

Wilson's Relational Epistemology. Wilson's (2008) Research is Ceremony articulates an Indigenous epistemology in which knowledge is fundamentally relational—produced, validated, and shared within networks of accountability that include human, more-than-human, and spiritual relations. Research is not the extraction of pre-existing information but a ceremony of honouring relationships. Four principles illuminate the prompt decomposition context:

First, knowledge is relational: it does not exist independently of the relationships in which it is produced. In the inquiry paradigm, the "knowledge" produced through AI interaction is not the AI's output alone but the entire process of questioning, responding, interpreting, and questioning again—the relationship itself. Second, relational accountability: knowledge production carries obligations to all relations affected. The inquiry paradigm may preserve this accountability more effectively than commands, which tend to reduce accountability to output accuracy. Third, context is constitutive: knowledge cannot be separated from its context without distortion. Conversational interaction maintains context through dialogue; isolated commands strip it away. Fourth, research as ceremony: the inquiry paradigm—with its attentiveness, openness to surprise, and requirement for interpretive engagement—may more closely resemble ceremonial practice than the transactional character of command-based interaction.

Kovach's Indigenous Methodologies. Margaret Kovach's (2021) Indigenous Methodologies (2nd ed., University of Toronto Press) positions storytelling as a central epistemological method—stories are not merely narrative devices but frameworks for transmitting values, relationships, and knowledge. Kovach emphasises relational accountability: researchers must be responsible to communities, and the process of knowledge-making is inseparable from its social and ethical context. For AI interaction design, Kovach's work suggests that conversational decomposition's value may lie not in its technical efficiency but in its structural similarity to storytelling—iterative, contextual, relationally situated. However, this parallel must be held carefully: the "stories" exchanged in human-LLM interaction lack the community grounding, spiritual significance, and intergenerational depth that characterise Indigenous storytelling as epistemological practice.

Little Bear's Jagged Worldviews. Leroy Little Bear's (2000) "Jagged Worldviews Colliding" contrasts Western linear, compartmentalised thought with Indigenous relational, holistic, cyclical worldviews. In Blackfoot metaphysics, reality is understood as constant flux and interconnectedness; language encodes relationship and action rather than static objects. Little Bear's analysis challenges the very premise of "decomposition"—breaking wholes into parts—as a Western epistemological assumption that may distort relational knowledge. From this perspective, the entire project of prompt decomposition, whether imperative or interrogative, embeds a particular (Western, analytical) way of knowing that is not neutral but culturally specific. The interrogative turn may be an improvement within this framework, but it does not escape the framework itself.

Indigenous AI Initiatives: FLAIR and Abundant Intelligences. The First Languages AI Reality (FLAIR) project, founded by Michael Running Wolf (Northern Cheyenne) and based at Mila–Quebec AI Institute, develops automatic speech recognition for endangered Indigenous languages using low-resource AI techniques. FLAIR's distinctive contribution is its insistence on community data sovereignty: Indigenous communities retain full control over their language data and how it is used. The project creates tools for language learning, automated transcription, and voice-controlled applications in Indigenous languages. FLAIR demonstrates that AI can serve Indigenous communities when designed according to Indigenous protocols rather than imposed from outside—a practical refutation of the assumption that AI development must follow Silicon Valley norms.

The Abundant Intelligences programme at Concordia University, co-directed by Jason Edward Lewis (Hawaiian/Samoan), is a six-year research initiative funded by Canada's New Frontiers in Research Fund. Involving eight universities and twelve Indigenous community organizations across Canada, the USA, and New Zealand, the programme develops AI models reflecting Indigenous values and epistemologies. Organised in regional "pods" that collaborate with local communities, it emphasises data sovereignty aligned with OCAP principles (Ownership, Control, Access, Possession). Lewis et al. (2024) published findings in AI & Society, establishing the programme's theoretical framework: that AI development must be decolonised at the epistemological level, not merely at the ethical-guidelines level. The programme demonstrates what Indigenous-led AI development looks like in practice—community-governed, relationally accountable, and epistemologically grounded in Indigenous ways of knowing.

The IP//AI Position Paper. The Indigenous Protocol and Artificial Intelligence Position Paper (Lewis et al., 2020) makes five specific arguments that challenge the foundations of mainstream AI development: (1) Indigenous protocols, values, and epistemologies must be integrated into AI design and governance, not treated as add-on ethical considerations; (2) Indigenous communities have distinct rights over their data (Indigenous data sovereignty); (3) AI should be understood as relational rather than value-neutral; (4) Indigenous ethical frameworks—including ceremonies, consultation processes, and cultural protocols—should inform AI development; (5) the concentration of power in mainstream AI development must be challenged through community-driven and Indigenous-led initiatives. The paper explicitly warns against "using Indigenous knowledge to enhance Western technology" without genuine partnership and benefit-sharing—a warning directly relevant to any attempt, including this review's, to deploy Indigenous epistemological concepts in the analysis of Western AI systems.

The CARE Principles and AI Interaction Design. The CARE Principles for Indigenous Data Governance (Carroll et al., 2020)—Collective Benefit, Authority to Control, Responsibility, Ethics—translate relational philosophy into actionable frameworks. Applied to prompt design: Collective Benefit demands that interaction design serve communities, not just individual users; Authority to Control requires that communities retain agency over how AI engages their knowledge; Responsibility demands ongoing accountability from prompt designers; Ethics requires honouring relational frameworks rather than imposing extractive ones. The interrogative paradigm may better support some of these principles than the imperative paradigm, as inquiry inherently allows for iterative community input, course-correction, and contextual adaptation. However, the CARE Principles were developed for Indigenous data governance specifically—extending them to general AI interaction design requires argumentative work that this review can only begin.

Addressing Critiques of the Relational Framing. We must acknowledge explicitly: LLM-based systems are products of massive data extraction, built on linguistic corpora that systematically underrepresent Indigenous languages, and deployed by corporations operating within capitalist structures fundamentally at odds with Indigenous models of collective ownership and relational accountability. Wilson's relational accountability involves human, more-than-human, and spiritual relations—networks of obligation that cannot be mapped onto the information exchange patterns of human-LLM interaction without trivialising their depth. The "relational" character of conversational prompting concerns data flow patterns; Indigenous relational epistemology concerns ontological relationships. These are qualitatively different kinds of "relational." Drawing parallels between them can be analytically illuminating, but conflating them risks precisely the co-optation that Indigenous scholars warn against: using the language of relationality to legitimise extractive technology.

Reworking "Recovery not Progress." The parallels between conversational AI's emerging emphasis on context, relationship, and iterative engagement and long-established Indigenous relational epistemologies are suggestive but must be held carefully. Indigenous and Western knowledge systems are contemporaneous and qualitatively different—not stages in a single developmental sequence. The risk of framing Western AI's conversational turn as a "return" to Indigenous ways is that it positions Indigenous knowledge as temporally prior rather than contemporaneously distinct, reproducing a sophisticated form of the developmental narrative that decolonial thought critiques. What we can say is that Indigenous relational epistemologies have long articulated principles—context-dependence of knowledge, relational accountability, the inseparability of process and product—that Western AI research is now encountering from within its own tradition. This convergence, if it is genuine, suggests not that Western AI is catching up to Indigenous thought but that these principles may be more fundamental to knowledge production than the extractive paradigm assumed. The appropriate response is not to claim Indigenous knowledge as a predecessor but to recognise Indigenous scholars as contemporary interlocutors whose frameworks offer resources for rethinking AI design—resources that must be engaged on Indigenous terms, through genuine partnership, not through academic appropriation.

Decolonial AI and Epistemic Justice. Recent work on AI and epistemic justice (Springer, 2026) argues that AI built exclusively on Western scientific models reproduces colonial epistemic injustices by marginalising Indigenous knowledge systems. The instruction paradigm—with its assumptions of extractable, decontextualised, individually possessed knowledge—embodies the epistemological framework that decolonial thought critiques. The inquiry paradigm, while not automatically decolonial, creates structural conditions potentially more compatible with relational epistemologies: it positions AI interaction as a dialogue within a context of relationships, rather than as an extraction from a repository. However, structural compatibility at the interaction level does not address the material conditions of AI production—the data extraction, environmental costs, and corporate concentration of power that characterise the industry. A genuinely decolonial approach to AI interaction would need to address both the epistemic level (how interactions are structured) and the material level (who benefits, who is harmed, who controls the technology).

2.5 Convergences and Gaps

The domains reviewed—technical prompt engineering, computational linguistics, cognitive science, philosophy of AI, and Indigenous epistemology—converge on a shared observation but diverge in what they see as its significance and how they propose to address it.

Convergence 1: The Structure of Interaction Matters. All domains recognise that how a human communicates with an AI system is not merely a means to an output but may be constitutive of the interaction's character. Technically, CoT showed that prompt structure shapes model capability (Wei et al., 2022); linguistically, the shift in illocutionary force alters the communicative contract (Gordon, 2024); cognitively, decomposition externalises deliberative processing (Kahneman, 2011); philosophically, the mode of prompting may determine the human-technology relation (González Arocha, 2025; Ihde, 1990). Zhang and Cao's (2025) information-theoretic analysis provides a bridge: prompts are information selectors whose linguistic form determines computational pathways, linking linguistic structure to technical performance to epistemic character. However, the strength of these connections varies: the technical evidence is strong (structure affects performance), the linguistic analysis is well-grounded (form has analysable properties), but the philosophical claims (structure determines the kind of relation) are interpretive and await empirical testing.

Convergence 2: Decomposition Recapitulates Dialogue. The technical progression from linear chains to trees to graphs to conversational multi-agent systems mirrors a progression from monologue to dialogue. Linguistically, this corresponds to the shift from imperative sequences to interrogative trees analysable through question semantics (Hamblin, 1973; Groenendijk & Stokhof, 1984). Cognitively, it corresponds to the externalisation of System 2 processing through structured interaction. Philosophically, it corresponds to the move from monologism to dialogism—though Bakhtin's critique (1929/1963) warns that structural dialogue does not guarantee genuine polyphony. We note that this convergence is structural: the three domains identify parallel structures, but whether these parallels reflect a genuine underlying phenomenon or are artefacts of analogical reasoning applied across domains is a question the review cannot definitively answer.

Convergence 3: Reflexivity is Essential. Djeffal's (2025) reflexive prompt engineering, González Arocha's (2025) critical phenomenology, and the technical literature's growing emphasis on self-referential optimisation (PromptBreeder, Fernando et al., 2024) all recognise that prompting systems should be capable of reflecting on their own practices. The cognitive science parallel is metacognition—the ability to monitor and regulate one's own cognitive processes. The Indigenous parallel is relational accountability—the ongoing obligation to reflect on the impacts of one's knowledge practices on all affected relations.

Gap 1: No Integrated Framework. Despite these convergences, no published work integrates technical, linguistic, cognitive, philosophical, and Indigenous analyses of prompt decomposition into a unified framework. González Arocha (2025) comes closest by treating prompts as simultaneously technical, discursive, and philosophical objects, but does not engage the technical APE or computational linguistics literatures. Djeffal (2025) bridges philosophy and practice but does not address decomposition specifically. Mahowald et al. (2024) bridge cognitive science and AI but do not address prompting. The field lacks a framework that can simultaneously account for the information-theoretic properties of prompts (Zhang & Cao, 2025), their speech act structure (Gordon, 2024), their cognitive effects (Kahneman, 2011; Sweller, 1988), and their phenomenological significance (González Arocha, 2025).

Gap 2: Conversational Prompt Optimisation. The technical literature identifies this as a potential gap: no APE method treats optimisation as an ongoing conversation. All current methods produce static prompt artefacts, even when the optimisation process is iterative. The linguistic and philosophical literatures suggest this gap may exist because the field lacks the theoretical vocabulary for conversational optimisation—a vocabulary that speech act theory, question semantics, and dialogical philosophy could provide. However, it is also possible that the gap exists because conversational optimisation is less effective than structured optimisation (see Section 2.6).

Gap 3: Decomposition Quality Metrics. Existing benchmarks (SWE-bench, GAIA, WebArena, AgentBench) evaluate task completion but never decomposition quality per se. A system that produces an elegant, minimal decomposition scores identically to one using wasteful redundancy—provided both succeed. Linguistic analysis suggests that decomposition quality might be measured through discourse coherence (Hobbs, 1979; Asher & Lascarides, 2003) or rhetorical structure well-formedness (Mann & Thompson, 1988). Cognitive analysis suggests it might be measured through cognitive load reduction. Philosophical analysis suggests that quality might include reflexive adequacy (Djeffal, 2025) and relational appropriateness (Wilson, 2008).

Gap 4: Cross-Cultural and Non-Western Perspectives. Both the linguistic and philosophical literatures are overwhelmingly Western in their theoretical frameworks. Indigenous epistemology (Wilson, 2008; Smith, 2021; Kovach, 2021; Little Bear, 2000) and the CARE Principles (Carroll et al., 2020) point toward alternative frameworks, but sustained application to prompt design, decomposition engines, and conversational AI architectures has not been undertaken. The cross-linguistic gap in computational linguistics compounds this: prompt strategies designed for English may fail or produce culturally inappropriate results in other languages.

Gap 5: The Semantics of Meta-Prompting. Instructions like "think step by step," "you are an expert," or "be concise" are meta-level pragmatic operators that modify how subsequent instructions are interpreted. These have no clear analogue in standard linguistic theory—they are neither standard speech acts nor standard discourse operators. Their semantics and compositional behaviour await formalisation, despite being central to practical prompt engineering.

What Each Domain Sees That Others Miss. Technical research sees performance: which methods produce better outputs. It misses the communicative, ethical, and epistemological dimensions of those methods. Computational linguistics sees structure: the discourse, pragmatic, and semantic properties of prompts. It misses the phenomenological character of the interaction and the power relations embedded in communicative choices. Cognitive science sees processing: the demands that different prompt structures place on human and machine cognition. It misses the ethical and relational dimensions. Philosophy sees meaning: the epistemological, ontological, and ethical implications. It often lacks engagement with the empirical details of technical systems and the formal precision of linguistic analysis. Indigenous epistemology sees relation: the ways knowledge-production practices shape and are shaped by networks of accountability. It challenges all Western-grounded domains to recognise both the relational character and the material conditions of what they study.

2.6 Counter-Evidence and Boundary Conditions

Intellectual honesty requires presenting the strongest case against the interrogative turn thesis. The following counter-evidence and boundary conditions constrain the claims this review can defensibly make.

The Performance Frontier is Structured and Imperative. The most successful and widely deployed prompt decomposition systems—DSPy (Khattab et al., 2023, 2024a), LangChain, MetaGPT (Hong et al., 2024), and Language Agent Tree Search (LATS; Zhou et al., 2024)—are overwhelmingly structured and imperative. They achieve the highest scores on standard benchmarks (SWE-bench, AgentBench, WebArena) through carefully designed pipelines, not through conversational inquiry. DSPy explicitly moves away from natural-language prompting toward programmatic compilation. LATS combines tree search with acting, not dialogue. These systems demonstrate that structure, not conversation, currently dominates the performance frontier.

The Evidence for Conversational Superiority is Thin. The systems positioned as evidence for the interrogative turn—ACT (Google, 2025), FATA (2025), and the Tri-Agent Evaluation Framework (KDD 2025)—are early-stage preprints and workshop papers that have not been replicated, widely adopted, or demonstrated to outperform structured approaches on standard benchmarks. They represent promising research directions, not established results. Citing them as evidence for a "paradigm shift" overstates their current significance.

Linguistic Form May Not Predict Quality. Leidner and Plachouras (2023) established that neither naturalness nor lower perplexity reliably predicts prompt effectiveness. Whether a prompt is phrased as "Summarise X" or "What are the key points of X?" may be a surface-level syntactic variation that instruction-tuned LLMs process through the same attention mechanisms, producing functionally similar outputs. The illocutionary force distinction is linguistically real but may have limited computational consequence in practice. Empirical evidence specifically testing whether imperative versus interrogative framing produces measurably different outputs—controlling for semantic content—is largely absent.

Multi-Agent "Dialogues" May Be Optimisation in Disguise. The multi-agent systems cited as evidence of conversational decomposition (CAMEL, AutoGen, ChatDev) use dialogue as an implementation mechanism, not as an epistemological stance. Their inter-agent "conversations" are optimisation procedures dressed in natural language—they are no more genuine inquiries than a genetic algorithm's crossover operations are sexual reproduction. The "conversational" framing may be a metaphor rather than a description. Whether genuine conversational structure (as opposed to the appearance of conversation) contributes to performance beyond what structured optimisation achieves is untested.

The Philosophical Arguments Risk Unfalsifiability. Claims about "phenomenological reorientation" and "epistemic co-construction" are interpretive claims that cannot be straightforwardly tested against technical evidence. They are frameworks for understanding, not empirical predictions. The fact that the philosophy of dialogue has concepts (Buber's I-Thou, Bakhtin's polyphony, Gadamer's hermeneutic circle) that can be mapped onto technical systems does not mean those systems instantiate those concepts. The "convergences" identified in Section 2.5 are suggestive analogies, not demonstrated isomorphisms. Strengthening these claims requires operationalisation and empirical testing—precisely the agenda proposed in Section 3.

The Indigenous Framing Faces Material Contradictions. The invitation to see conversational prompting as "relational" (in Wilson's sense) is in tension with the material conditions of LLM production and deployment. These systems are products of massive data extraction, environmental cost, and corporate concentration of power—conditions fundamentally opposed to Indigenous models of collective benefit and relational accountability. Framing them through Indigenous epistemology without addressing these material conditions risks legitimising extractive technology through the language of relationality.

What Would Falsify the Thesis? If the interrogative turn is a genuine phenomenon, we should expect: (a) interrogative systems consistently outperforming imperative ones on controlled benchmarks; (b) measurable differences in output epistemological quality (not just accuracy) as a function of prompt illocutionary force; (c) user studies showing that inquiry-based interaction produces different epistemic engagement; (d) multi-agent dialogue producing output diversity beyond single-model sampling. None of these has been conclusively demonstrated.

Conclusion: Framing the Thesis Appropriately. The interrogative turn thesis is best understood as identifying an emerging trajectory and proposing a normative framework, not as documenting an accomplished paradigm shift. Early evidence suggests that conversational, inquiry-based approaches to prompt decomposition are emerging as a research direction with theoretical motivation from linguistics, cognitive science, and philosophy. Whether this direction will prove empirically superior to structured approaches, and under what conditions, remains to be established. The thesis's value lies not in its empirical certainty but in its theoretical productivity: it identifies questions worth asking, frameworks worth testing, and research directions worth pursuing. The remainder of this review's contribution—its research questions and proposed methods—should be read in this spirit.

3. Research Questions

The following research questions emerge from the convergences, gaps, and counter-evidence identified in this literature review. They are organised by priority, with each question specifying the disciplines it bridges, its significance, feasible methods, and assessed novelty. Following each original question, we provide sharpened formulations (RQ-A through RQ-E) drawn from peer review that strengthen testability and reduce framing bias.

3.1 Primary Research Questions (Highest Impact, Highest Novelty)

RQ1: How does the illocutionary force of prompts (imperative vs. interrogative vs. mixed) measurably affect the epistemological character of LLM outputs—not merely their accuracy but their explanatory depth, epistemic hedging, and capacity to provoke further inquiry?

Disciplines bridged: Computational linguistics (speech act theory, pragmatics) × Philosophy of AI (epistemology, González Arocha's critical phenomenology) × Technical AI (prompt engineering, evaluation metrics).

Why it matters: González Arocha (2025) argues that prompt mode reconfigures the human-AI relationship, and Zhang and Cao (2025) show that linguistic choices shape information extraction. But no empirical work connects speech act properties of prompts to the epistemological quality (not merely factual accuracy) of outputs. This would ground the philosophical claims in measurable linguistic evidence—or reveal them to be empirically unsupported.

Feasible methods: Controlled experiment varying prompt illocutionary force across tasks, with outputs evaluated by domain experts on dimensions including epistemic hedging, explanatory structure, identification of limitations, and generative capacity (ability to provoke further questions). Corpus-linguistic analysis of output properties as a function of prompt speech act type.

Novelty: High. Existing work evaluates accuracy and fluency; no published study evaluates epistemological quality as a function of prompt illocutionary force.

Sharpened formulation (RQ-A): "Under what task conditions does prompt illocutionary force predict task performance?" Given Leidner and Plachouras's (2023) finding that linguistic form does not reliably predict output quality: For which task types, model architectures, and complexity levels does the choice between imperative and interrogative framing produce statistically significant differences in (a) task accuracy, (b) output diversity, (c) explanation quality, and (d) error type distribution? This explicitly tests the boundary conditions of the thesis rather than assuming its validity. Methods: Factorial experiment: {task type: factual, analytical, creative, multi-step} × {prompt form: imperative, interrogative, mixed} × {model: at least 3 architectures} × {complexity: low, medium, high}. N ≥ 200 prompts per cell.

RQ2: Can Decomposed Prompting (Khot et al., 2023) be formally reinterpreted as compositional question semantics, where sub-task decomposition corresponds to sub-question generation and task composition corresponds to answer-set composition—and does this reinterpretation yield measurably superior decomposition strategies?

Disciplines bridged: Computational linguistics (formal question semantics: Hamblin, Groenendijk & Stokhof) × Technical AI (decomposed prompting, DSPy) × Philosophy of language (compositionality).

Why it matters: The structural parallel between prompt decomposition and question decomposition has been noted but never formalised or empirically tested. If decomposition strategies guided by formal question semantics outperform ad hoc decomposition, this would establish a rigorous theoretical foundation for decomposition engine design—and demonstrate that linguistic theory has direct engineering value.

Feasible methods: Formal mapping between DecomP's modular architecture and partition semantics. Implementation of question-semantics-guided decomposition in a DSPy-like framework. Comparative evaluation against standard decomposition on compositional reasoning benchmarks (SCAN, multi-hop QA, symbolic reasoning).

Novelty: Very high. No published work formalises the decomposition-as-question-composition mapping or tests it empirically.

RQ3: What would a "relational" prompt decomposition engine look like—one designed according to Indigenous relational epistemology (Wilson, 2008; Smith, 2021) and the CARE Principles (Carroll et al., 2020)—and how would it differ architecturally and behaviourally from existing extractive decomposition systems?

Disciplines bridged: Indigenous epistemology × Technical AI (system architecture) × Philosophy of AI (relational ethics, Coeckelbergh, 2012).

Why it matters: Indigenous epistemology offers alternatives to the extractive assumptions embedded in current AI interaction design, but these have never been translated into concrete system architecture. This question moves beyond critique to construction—asking what relational AI interaction design would actually require.

Feasible methods: Participatory design with Indigenous knowledge holders and communities. Architectural specification grounded in CARE Principles. Comparative analysis with existing systems (DSPy, AutoGen, CAMEL) on dimensions of relational accountability, context preservation, collective benefit, and community authority. This research must be conducted with Indigenous communities, not merely about Indigenous epistemology.

Novelty: Very high. No published system architecture is explicitly grounded in Indigenous relational epistemology.

3.2 Secondary Research Questions (Important, Partially Addressed)

RQ4: To what extent does Bakhtin's critique of monologism apply to multi-agent prompt decomposition systems (CAMEL, AutoGen, ChatDev)—do multiple LLM agents produce genuine dialogical decomposition or merely "algorithmic monologism" with distributed voices?

Disciplines bridged: Philosophy of AI (Bakhtinian dialogism) × Technical AI (multi-agent systems) × Computational linguistics (discourse analysis).

Why it matters: Multi-agent systems are the technical frontier of conversational decomposition, but the philosophical question of whether they achieve genuine dialogue or merely simulate it has not been empirically investigated. Bakhtin's framework (1929/1963) predicts that agents sharing the same underlying model cannot produce genuine polyphony—a testable claim.

Feasible methods: Discourse analysis of multi-agent conversations comparing systems using the same vs. different underlying models. Coding for genuine disagreement, perspective diversity, and irreducible otherness. Information-theoretic analysis of whether multi-agent outputs exceed the diversity achievable by single-model sampling.

Sharpened formulation (RQ-C): "Does multi-agent architectural diversity produce output diversity beyond single-model variance?" Is the output diversity of N agents using the same underlying model statistically greater than N independent samples from that model? If not, multi-agent "dialogue" is computationally equivalent to repeated sampling. If it is greater, what architectural features (role differentiation, communication protocols, disagreement mechanisms) contribute to genuine diversity? Methods: Compare output distributions (semantic similarity, solution strategy diversity, error pattern diversity) across: (a) N independent samples from one model, (b) N agents with different roles but same model, (c) N agents with different models.

Novelty: Moderate-to-high.

RQ5: Can Djeffal's (2025) reflexive prompt engineering framework be extended to decomposition engines—creating "reflexive decomposition" systems that monitor and evaluate the ethical, epistemological, and relational quality of their own decomposition strategies in real-time?

Disciplines bridged: Applied ethics (Djeffal, 2025) × Technical AI (self-monitoring systems, TextGrad) × Philosophy (González Arocha's critical phenomenology, Floridi's epistemic agency).

Feasible methods: Extension of TextGrad's feedback mechanism to include ethical and epistemological evaluation criteria alongside task performance. Integration of Djeffal's five-component framework into the optimisation loop of a DSPy-like system. Evaluation through case studies in ethically sensitive domains (healthcare, legal, education).

Sharpened formulation (RQ-E): "Can a formal discourse grammar be induced from successful prompt decompositions, and does compliance predict success?" Extend RST annotation (Zeldes et al., 2025) with question-semantic types (Hamblin, 1973) to create a decomposition-specific annotation scheme. Induce grammar from annotated corpus. Test predictive validity. Methods: (1) Annotate 300+ successful decompositions from diverse benchmarks using extended RST + question-semantic categories. (2) Induce grammar. (3) Annotate 100+ failed decompositions. (4) Test whether grammar violations discriminate success from failure.

Novelty: Moderate.

RQ6: How do Gricean maxim violations in prompt design correlate with specific decomposition failures—and can pragmatic analysis predict which decomposition strategies will fail before execution?

Disciplines bridged: Computational linguistics (Gricean pragmatics, Krause & Vossen, 2024) × Technical AI (decomposition failure analysis, error-reflection methods).

Why it matters: The technical literature documents decomposition failures (error loops, hallucination cascades, inefficient spirals) but lacks a linguistic theory of why they occur. Gricean pragmatics predicts that violations of Quantity (over- or under-specification), Quality (unsupported assumptions), Relation (irrelevant sub-tasks), and Manner (ambiguous decomposition) should produce characteristic failure modes. Validating this would enable pragmatics-informed decomposition design.

Feasible methods: Post-hoc pragmatic analysis of decomposition failures in existing benchmarks (SWE-bench, AgentBench). Corpus annotation of decomposition prompts for Gricean maxim violations. Statistical modelling of violation-failure correlations.

Sharpened formulation (RQ-B): "Can pragmatic well-formedness of decomposition predict task failure before execution?" Hypothesis: Decomposition plans that violate Gricean maxims (as operationalised by Krause & Vossen, 2024) at the sub-task specification level will fail at statistically higher rates than pragmatically well-formed decompositions, controlling for task difficulty and model capability. Methods: Annotate 500+ decomposition traces for Gricean maxim violations at each decomposition step. Build a logistic regression model predicting task failure from pragmatic features.

Novelty: Moderate.

RQ7: Does the phenomenological relation humans experience with AI (Ihde's hermeneutic vs. alterity relation) measurably change when interacting through imperative versus interrogative prompting—and do these different phenomenological stances produce different qualities of epistemic engagement?

Disciplines bridged: Phenomenology (Ihde, 1990; González Arocha, 2025) × Experimental psychology × Computational linguistics (speech act analysis of interaction transcripts).

Feasible methods: Phenomenological interview study with users performing identical tasks through imperative vs. interrogative prompts. Think-aloud protocols coded for hermeneutic vs. alterity language. Evaluation of resulting work products for epistemic quality (depth, nuance, self-awareness of limitations).

Sharpened formulation (RQ-D): "What is the cognitive load profile of conversational vs. structured decomposition for human users?" If conversational decomposition requires multi-turn dialogue while structured decomposition requires completing forms: What are the comparative cognitive load profiles (intrinsic, extraneous, germane) of these interaction modes, and how do they interact with user expertise, task complexity, and time pressure? Methods: Within-subjects experiment with eye tracking, NASA-TLX, and think-aloud protocols.

Novelty: High.

3.3 Exploratory Research Questions (Frontier, Speculative)

RQ8: Can the self-referential optimisation loop in PromptBreeder (Fernando et al., 2024) be reconceived as a proto-Socratic process, and if so, does explicitly encoding Socratic structures (elenchus, aporia, maieutics) into the mutation operators improve optimisation outcomes?

Disciplines bridged: Classical philosophy (Socratic method) × Technical AI (evolutionary prompt optimisation) × Computational linguistics (dialogue structure).

Feasible methods: Implementation of Socratic mutation operators (elenchus: identifying contradictions in prompt performance; aporia: flagging confident failures; maieutics: eliciting latent prompt improvements through targeted questioning). Comparison with standard PromptBreeder on established benchmarks.

Novelty: Very high.

RQ9: What would a discourse grammar of prompt decomposition look like—a formal specification of well-formed and ill-formed decomposition structures that integrates RST relations (Mann & Thompson, 1988), coherence theory (Hobbs, 1979), and question semantics (Hamblin, 1973)?

Disciplines bridged: Computational linguistics (discourse grammar, RST, question semantics) × Technical AI (decomposition architectures) × Formal language theory.

Feasible methods: Corpus annotation of decomposed prompts using extended RST and question-semantic categories. Induction of grammar rules from well-formed decompositions. Implementation as a formal validation layer in decomposition engines.

Novelty: Very high.

RQ10: How does the temporal phenomenology of human-AI dialogue—the near-instantaneous response of LLMs versus the time-bound, pause-laden rhythm of human conversation—affect the epistemic and relational quality of inquiry-based interaction, and can designed latency improve outcomes?

Disciplines bridged: Phenomenology (temporal experience) × HCI (interaction design) × Philosophy of dialogue (Buber's "between," Levinas's temporality of the Other).

Feasible methods: Experimental study comparing standard vs. paced AI responses in inquiry-based interactions. Phenomenological interviews on the experience of temporal rhythm. Evaluation of epistemic output quality as a function of interaction pacing.

Novelty: Very high.

3.4 Additional Research Questions (Boundary and Diversity)

RQ-F: Under what task, model, and user conditions does imperative decomposition outperform interrogative decomposition?

Rationale: The counter-evidence presented in Section 2.6 suggests that structured, imperative approaches dominate the current performance frontier. Systematically mapping the boundary conditions—task types where imperatives excel (e.g., well-defined, low-ambiguity tasks), model architectures that respond differently to prompt form, and user expertise levels that moderate the effect—would strengthen the thesis by bounding it. If the interrogative turn is genuine, it should have specifiable failure conditions.

Feasible methods: Meta-analysis of existing benchmark data, supplemented by controlled experiments systematically varying task type (factual recall, multi-step reasoning, creative generation, code production), model architecture (instruction-tuned, RLHF, base models), and prompt form (imperative, interrogative, mixed). Statistical analysis of interaction effects.

Novelty: High. No systematic mapping of imperative vs. interrogative performance boundaries exists.

RQ-G: How do different users (experts vs. novices, different cultural backgrounds, different languages) experience and benefit from conversational decomposition?

Rationale: The review's analysis is conducted entirely from a researcher's perspective; how actual users experience different prompting modes is unknown. Expert users may prefer structured control; novice users may benefit from conversational scaffolding. Users from different cultural and linguistic backgrounds may have different relationships to the imperative/interrogative distinction—particularly given the cross-linguistic gap identified in Section 2.2. Indigenous and non-Western users may bring relational orientations that interact differently with conversational AI.

Feasible methods: Mixed-methods study with stratified sampling across expertise levels, cultural backgrounds, and languages. Quantitative measures (task performance, efficiency, satisfaction) combined with qualitative interviews. Cross-linguistic component testing prompts in typologically diverse languages.

Novelty: High. No published user study examines prompt mode effects across user populations.

4. Conclusion

This literature review has examined the evolution of prompt decomposition from instruction to inquiry across technical, linguistic, cognitive, philosophical, and Indigenous epistemological literatures. The interrogative turn thesis proposes that this evolution constitutes a convergent phenomenon visible across multiple disciplines—yet we have been careful to distinguish between what the evidence supports and what remains speculative.

The technical literature documents a trajectory: from static templates through structured reasoning chains, automated optimisation, and dynamic context management toward early conversational decomposition systems that negotiate task understanding through dialogue. However, the performance frontier remains structured and imperative, and the conversational evidence is preliminary. The computational linguistics literature reveals this trajectory as a shift in illocutionary force from directives to questions—a shift with formal semantic consequences (question decomposition yields richer compositional structures than command sequences) and pragmatic implications (interrogative framing repositions human-AI interaction as knowledge-sharing dialogue). Cognitive science frameworks explain why decomposition helps: it externalises System 2 deliberation, reduces extraneous cognitive load, and compensates for the dissociation between formal and functional competence in LLMs.

The philosophical literature presents a more complex picture. González Arocha's (2025) critical phenomenology argues that prompting is a philosophical practice with epistemological and ethical dimensions—a claim we find productive but note is one philosopher's interpretive position, not an established consensus. The "stochastic parrot" critique (Bender et al., 2021) and Shanahan's (2024) analysis of LLM metaphors compel us to interpret the interrogative turn primarily as a claim about human epistemic agency—how the structure of interaction shapes the human user's critical engagement and interpretive authority—rather than as a claim about machine communicative capacity. Suchman's (2007) situated action framework challenges the decompose-then-execute model itself, suggesting that conversational decomposition, where the plan emerges through interaction, may be better aligned with how intelligent action actually unfolds.

Indigenous relational epistemology (Wilson, 2008; Smith, 2021; Kovach, 2021; Little Bear, 2000) recontextualises the entire narrative, but not in the way a triumphalist account would suggest. Indigenous frameworks reveal that the principles the interrogative turn thesis identifies—context-dependence of knowledge, relational accountability, the inseparability of process and product—have long been articulated by Indigenous scholars. However, the parallels between conversational AI and Indigenous relational epistemology must be held carefully: they are suggestive, not identificatory. The material conditions of AI production—data extraction, corporate control, environmental cost, systematic underrepresentation of Indigenous languages—are in tension with the relational values that Indigenous epistemology articulates. Genuine engagement with Indigenous frameworks requires not academic citation but partnership, benefit-sharing, and the transfer of power over AI development to Indigenous communities.

The most promising research directions lie at the intersections: formal question semantics applied to decomposition architecture (RQ2), pragmatic analysis predicting decomposition failures (RQ6), systematic mapping of imperative vs. interrogative boundary conditions (RQ-F), and empirical testing of phenomenological and cognitive effects of prompt mode on human users (RQ7, RQ-D). The critical gap—an integrated framework spanning technical, linguistic, cognitive, philosophical, and relational analyses—remains the field's most urgent theoretical need.

The interrogative turn thesis, in its revised form, is best understood as follows: early evidence from multiple disciplines suggests that the structure of human-AI interaction—including the choice between imperative and interrogative modes—may matter not only for task performance but for the epistemic, ethical, and relational character of the interaction. Whether this suggestion will be borne out empirically, and under what conditions, is the research programme this review proposes. The thesis's value lies not in its current empirical certainty but in its capacity to generate productive research questions at the intersection of disciplines that too rarely speak to one another.

5. Glossary of Key Terms

Algorithmic monologism. The production of apparently multi-voiced or dialogical outputs by a single optimising mechanism (e.g., one LLM playing multiple roles). Coined in this review by analogy with Bakhtin's monologism. The term identifies a structural limitation: multiple voices do not guarantee genuine polyphony when all voices emerge from the same computational process.

Automated Prompt Engineering (APE). Methods that use computational processes—evolutionary algorithms, reinforcement learning, LLM-based generation—to automatically discover, evaluate, and refine prompts, reducing or eliminating the need for manual prompt crafting.

CARE Principles. Collective Benefit, Authority to Control, Responsibility, Ethics—principles for Indigenous Data Governance developed by Carroll et al. (2020) to ensure that data practices serve Indigenous communities and respect Indigenous authority over their knowledge and data.

Compositionality gap. The empirically observed phenomenon (Press et al., 2023) whereby LLMs can correctly answer individual sub-questions but fail to compose their answers into correct multi-step solutions. Decomposition methods address this gap by making the compositional structure explicit.

Context engineering. The practice of optimising the entire context window—including conversation history, retrieved documents, tool outputs, and system state—rather than just the prompt text. Introduced by Anthropic (2025) [blog post].

Conversational decomposition. Task decomposition that occurs through dialogue between a human and an AI system (or between AI agents), as opposed to decomposition specified in advance by a human designer.

Decomposition. The process of breaking a complex task into simpler sub-tasks that can be addressed individually and then composed into a complete solution.

Illocutionary force. In speech act theory (Austin, 1962; Searle, 1969), the communicative function of an utterance—what the speaker is doing by saying something (commanding, questioning, promising, etc.). The interrogative turn thesis centres on the shift from directive to interrogative illocutionary force.

Interrogative turn. The thesis, proposed in this review, that prompt engineering is undergoing a shift from imperative (command-based) to interrogative (question-based) modes of interaction, with potential consequences for the epistemic, ethical, and relational character of human-AI interaction. Presented as an emerging trajectory and normative proposal, not an accomplished fact.

OCAP Principles. Ownership, Control, Access, Possession—principles of Indigenous data sovereignty developed by the First Nations Information Governance Centre, asserting Indigenous communities' rights over the collection, ownership, and application of their data.

Relational epistemology. An epistemological framework, articulated by Wilson (2008) and other Indigenous scholars, in which knowledge is understood as fundamentally relational—produced, validated, and shared within networks of accountability. Contrasts with extractive epistemologies that treat knowledge as a resource to be discovered and possessed.

Reflexive prompt engineering. Djeffal's (2025) framework requiring that prompt design incorporate ongoing ethical reflection on the power relations, assumptions, and consequences embedded in prompt choices. Makes prompt engineering a continuous ethical practice rather than a one-time technical task.

Rhetorical Structure Theory (RST). A linguistic theory (Mann & Thompson, 1988) that analyses texts as hierarchical structures of nucleus (central content) and satellite (supporting content) units connected by coherence relations (Elaboration, Background, Condition, etc.).

Speech act. An utterance considered as an action (Austin, 1962). Speech act theory analyses utterances in terms of their locutionary (what is said), illocutionary (what is done by saying it), and perlocutionary (what effects result) dimensions.

Stochastic parrot. Bender et al.'s (2021) characterisation of LLMs as systems that "haphazardly stitch together sequences of linguistic forms" from training data without understanding, raising questions about whether attributing communicative competence to such systems is appropriate.

Bibliography

Aguas, J. J. S. (2025). Martin Buber's philosophy of dialogue and its implications for artificial intelligence ethics. Philosophia, 26(1), 1–18.

Anthropic. (2025). Effective context engineering for AI agents [Blog post]. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge University Press.

Austin, J. L. (1962). How to do things with words. Oxford University Press.

Bakhtin, M. M. (1929/1963). Problems of Dostoevsky's poetics (C. Emerson, Trans.). University of Minnesota Press, 1984.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? 🦜 Proceedings of FAccT 2021, 610–623. https://doi.org/10.1145/3442188.3445922

Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajber, J., Lehmann, T., Grundmann, M., Nyczyk, H., Schick, R., & Hoefler, T. (2024). Graph of Thoughts: Solving elaborate problems with large language models. Proceedings of AAAI 2024. https://arxiv.org/abs/2308.09687

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Proceedings of NeurIPS 2020. https://arxiv.org/abs/2005.14165

Buber, M. (1923). I and Thou (W. Kaufmann, Trans.). Scribner, 1970.

Bunt, H. (2009). The DIT++ taxonomy for functional dialogue acts. Proceedings of the EDAML 2009 Workshop.

Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043

Chang, E. Y., et al. (2023). Prompting large language models with the Socratic method. IEEE Access, 11, 51156–51167. https://doi.org/10.1109/ACCESS.2023.3267890

Chen, W., Su, Y., Zuo, J., et al. (2023). AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors. https://arxiv.org/abs/2308.10848

Christiano, P., Leike, J., Brown, T., Marber, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Proceedings of NeurIPS 2017.

Coeckelbergh, M. (2012). Growing moral relations: Critique of moral status ascription. Palgrave Macmillan.

Coeckelbergh, M., & Gunkel, D. J. (2025). Communicative AI: A critical introduction to large language models. Polity.

Dennett, D. C. (1987). The intentional stance. MIT Press.

Djeffal, C. (2025). Reflexive prompt engineering: A framework for responsible prompt engineering and AI interaction design. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25). https://arxiv.org/abs/2504.16204

Dreyfus, H. L. (1972). What computers can't do: The limits of artificial intelligence. MIT Press.

Dreyfus, H. L. (1992). What computers still can't do: A critique of artificial reason. MIT Press.

Dreyfus, H. L. (2007). Why Heideggerian AI failed and how fixing it would require making it more Heideggerian. Philosophical Psychology, 20(2), 247–268.

Fernando, C., Banarse, D., Michalewski, H., Osindero, S., & Rocktäschel, T. (2024). Promptbreeder: Self-referential self-improvement via prompt evolution. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.16797

Ferrario, A., & Loi, M. (2026). Are large language models intentional? The limits of referential grounding. Philosophy & Technology, 39. https://doi.org/10.1007/s13347-026-01079-4

Floridi, L. (2023). The ethics of artificial intelligence: Principles, challenges, and opportunities. Oxford University Press.

Floridi, L. (2025). AI as agency without intelligence: On artificial intelligence as a new form of agency. Philosophy & Technology, 38. https://doi.org/10.1007/s13347-025-00858-9

Gadamer, H.-G. (1960). Truth and method (J. Weinsheimer & D. G. Marshall, Trans.). Continuum, 2004.

González Arocha, J. (2025). Critical phenomenology of prompting in artificial intelligence. Sophia, 39. https://doi.org/10.17163/soph.n39.2025.04

Gordon, J. (2024). Speech acts and large language models. PhilArchive. https://philarchive.org/archive/GORSAA-12v1

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). Academic Press.

Groenendijk, J., & Stokhof, M. (1984). Studies on the semantics of questions and the pragmatics of answers (Doctoral dissertation). University of Amsterdam.

Gubelmann, R. (2024). Large language models, agency, and why speech acts are beyond them (for now). Philosophy & Technology, 37, 45. https://doi.org/10.1007/s13347-024-00696-1

Guo, Q., Wang, R., Guo, J., Li, B., Song, K., Tan, X., Liu, G., Bian, J., & Yang, Y. (2024). EvoPrompt: Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.08532

Hamblin, C. L. (1973). Questions in Montague English. Foundations of Language, 10(1), 41–53.

Hasse, C. (2017). Rethinking the I-You relation through dialogical philosophy in the age of social robots. AI & Society, 32, 467–479. https://doi.org/10.1007/s00146-017-0703-x

Hobbs, J. R. (1979). Coherence and coreference. Cognitive Science, 3(1), 67–90.

Hong, S., Zhuge, M., Chen, J., et al. (2024). MetaGPT: Meta programming for a multi-agent collaborative framework. Proceedings of ICLR 2024. https://arxiv.org/abs/2308.00352

Hu, J., et al. (2025). Pragmatics in the era of large language models. https://arxiv.org/abs/2502.12378

Ihde, D. (1990). Technology and the lifeworld: From garden to earth. Indiana University Press.

Ivison, H., et al. (2024). From language modeling to instruction following: Understanding the behavior shift in LLMs after instruction tuning. Proceedings of NAACL 2024. https://aclanthology.org/2024.naacl-long.130/

Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Mober, H., et al. (2024a). DSPy: Compiling declarative language model calls into self-improving pipelines. Proceedings of ICLR 2024 (Spotlight). https://arxiv.org/abs/2310.03714

Khattab, O., et al. (2024b). Fine-tuning and prompt optimization: Two great steps that work better together. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.597.pdf

Khot, T., Trivedi, H., Finlayson, M., et al. (2023). Decomposed prompting: A modular approach for solving complex tasks. Proceedings of ICLR 2023. https://arxiv.org/abs/2210.02406

Kovach, M. (2021). Indigenous methodologies: Characteristics, conversations, and contexts (2nd ed.). University of Toronto Press.

Krause, L., & Vossen, P. (2024). The Gricean Maxims in NLP—A survey. Proceedings of INLG 2024. https://aclanthology.org/2024.inlg-main.39/

Leidner, J. L., & Plachouras, V. (2023). The language of prompting: What linguistic properties make a prompt successful? Findings of EMNLP 2023. https://arxiv.org/abs/2311.01967

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Proceedings of EMNLP 2021.

Levinas, E. (1961). Totality and infinity: An essay on exteriority (A. Lingis, Trans.). Duquesne University Press, 1969.

Lewis, J. E., Abdilla, A., Arista, N., Baker, K., Benesiinaabandan, S., Brown, M., Cheung, M., Coleman, M., Collings, J., Duarte, M., ... Running Wolf, M. (2020). Indigenous Protocol and Artificial Intelligence Position Paper. https://doi.org/10.11573/spectrum.library.concordia.ca.00986506

Lewis, J. E., et al. (2024). Abundant Intelligences: Toward Indigenous-led AI development. AI & Society. https://doi.org/10.1007/s00146-024-01936-6

Li, G., Hammoud, H., Itani, H., et al. (2023). CAMEL: Communicative agents for "mind" exploration of large language model society. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2303.17760

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of ACL 2021.

Li, et al. (2025). A survey of automatic prompt engineering: An optimization perspective. https://arxiv.org/abs/2502.11560

Little Bear, L. (2000). Jagged worldviews colliding. In M. Battiste (Ed.), Reclaiming Indigenous voice and vision (pp. 77–85). University of British Columbia Press.

Ma, Y., et al. (2024). The death and life of great prompts: Analyzing the evolution of LLM prompts from the structural perspective. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.1227/

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011

Mann, W. C., & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3), 243–281.

Markl, N. (2025). Taxonomizing representational harms using speech act theory. https://arxiv.org/abs/2504.00928

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., ... Lowe, R. (2022). Training language models to follow instructions with human feedback. Proceedings of NeurIPS 2022. https://arxiv.org/abs/2203.02155

Press, O., et al. (2023). Measuring and narrowing the compositionality gap in language models. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.378/

Qian, C., Cong, X., Yang, C., et al. (2024). ChatDev: Communicative agents for software development. Proceedings of ACL 2024.

Ramnath, et al. (2025). A systematic survey of automatic prompt optimization techniques. Proceedings of EMNLP 2025. https://arxiv.org/abs/2502.16923

Roberts, C. (2012). Information structure in discourse: Towards an integrated formal theory of pragmatics. Semantics and Pragmatics, 5(6), 1–69.

Russo, F., Schliesser, E., & Wagemans, J. (2023). Connecting ethics and epistemology of AI. AI & Society, 38. https://doi.org/10.1007/s00146-022-01617-6

SciELO. (2025). Bakhtinian dialogism and artificial intelligence [Blog post]. SciELO in Perspective.

Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424.

Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2), 68–79. https://doi.org/10.1145/3624724

Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. Proceedings of EMNLP 2020, 4222–4235. https://arxiv.org/abs/2010.15980

Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129–138.

Smith, L. T. (2021). Decolonizing methodologies: Research and Indigenous peoples (3rd ed.). Zed Books.

Sperber, D., & Wilson, D. (1986/1995). Relevance: Communication and cognition (2nd ed.). Blackwell.

Springer, J. (2026). AI and epistemic justice: Decolonial perspectives. AI & Society.

STRV. (2024). Wittgenstein and the language of AI [Industry analysis; non-peer-reviewed]. STRV Blog.

Suchman, L. (2007). Human-machine reconfigurations: Plans and situated actions (2nd ed.). Cambridge University Press.

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285.

Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer.

Wang, G., Xie, Y., Jiang, Y., et al. (2023). Voyager: An open-ended embodied agent with large language models. https://arxiv.org/abs/2305.16291

Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023a). Plan-and-Solve prompting: Improving zero-shot chain-of-thought reasoning. Proceedings of ACL 2023. https://arxiv.org/abs/2305.04091

Wang, X., Li, C., Wang, Z., Bai, F., Luo, H., Zhang, J., Jojic, N., Xing, E. P., & Hu, Z. (2024a). PromptAgent: Strategic planning with language models enables expert-level prompt optimization. Proceedings of ICLR 2024. https://arxiv.org/abs/2310.16427

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Proceedings of NeurIPS 2022. https://arxiv.org/abs/2201.11903

Wilson, S. (2008). Research is ceremony: Indigenous research methods. Fernwood Publishing.

Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell.

Wu, Q., Bansal, G., Zhang, J., et al. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. https://arxiv.org/abs/2308.08155

Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2024). Large language models as optimizers (OPRO). Proceedings of ICLR 2024. https://arxiv.org/abs/2309.03409

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate problem solving with large language models. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2305.10601

Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Huang, Z., Guestrin, C., & Zou, J. (2024/2025). TextGrad: Automatic "differentiation" via text. arXiv 2024; Nature 2025. https://arxiv.org/abs/2406.07496

Zeldes, A., et al. (2025). eRST: A signaled graph theory of discourse relations and organization. Computational Linguistics, 51(1), 23–72. https://doi.org/10.1162/coli_a_00538

Zhang, J., & Cao, Y. (2025). Why prompt design matters and works: A complexity analysis of prompt search space in LLMs. Proceedings of ACL 2025. https://aclanthology.org/2025.acl-long.1562/

Zhang, Y., Sreedharan, S., & Kambhampati, S. (2023b). Meta prompting for AI systems. https://arxiv.org/abs/2311.11482

Zhou, A., Yan, K., Shlapentokh-Rothman, M., et al. (2024). Language Agent Tree Search unifies reasoning, acting, and planning in language models. Proceedings of ICML 2024. https://arxiv.org/abs/2310.04406

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2023a). Least-to-Most prompting enables complex reasoning in large language models. Proceedings of ICLR 2023. https://arxiv.org/abs/2205.10625

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023b). Large language models are human-level prompt engineers (APE). Proceedings of ICLR 2023. https://arxiv.org/abs/2211.01910

Ziderman, O. (2024). Martin Buber's dialogical thought as a philosophy of action. Springer.

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G. (2019). Fine-tuning language models from human preferences. https://arxiv.org/abs/1909.08593

This literature review (v2) was produced as part of the IAIP Polyphonic Discussion research protocol (RCH-CTX-Polyphonic-discussion--2604060040). It synthesises four parallel survey tracks: Automated Prompt Engineering Methods, Computational Linguistics, Philosophy of AI, and Prompt Decomposition Engines. Revised in response to peer review (composite score: 5.6/10), addressing citation integrity, epistemic humility, counter-evidence engagement, cognitive science integration, Indigenous epistemology depth, and search methodology documentation. April 2026.