From Instruction to Inquiry: A Cross-Disciplinary Survey of Prompt Decomposition's Linguistic, Cognitive, Philosophical, and Relational Dimensions
Revised Version 2.0 â IAIP Polyphonic Research Context Date: April 2026
Abstract
The evolution of prompt engineering from static imperative commands toward dynamic, inquiry-based conversational systems constitutes an undertheorised development in contemporary artificial intelligence. This survey examines this trajectoryâwhat we term the "interrogative turn"âacross technical prompt engineering, computational linguistics, cognitive science, and the philosophy of AI. We identify an emerging trajectory toward conversational decomposition, supported by early evidence from specific systems (ACT, FATA, AutoGen), theoretically motivated by linguistic and philosophical arguments, but not yet empirically established as reliably superior to structured approaches. Drawing on GonzĂĄlez Arocha's (2025) critical phenomenology of prompting and Krause and Vossen's (2024) Gricean analysis of NLP as complementary focal lenses, we argue that this trajectoryâif it consolidatesâwould represent a reconfiguration of aspects of the human-AI interaction with epistemological, ethical, and relational dimensions. The technical literature documents a progression from manual templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023b), and decomposed prompting (Khot et al., 2023) toward conversational decomposition systems (2025â2026), all enabled by the instruction-following capacities unlocked by reinforcement learning from human feedback (Ouyang et al., 2022). Computational linguistics reveals this progression as a shift in illocutionary force from directives to questions, analysable through speech act theory (Gordon, 2024), Gricean pragmatics (Krause & Vossen, 2024), Relevance Theory (Sperber & Wilson, 1986/1995), and formal question semantics (Hamblin, 1973). Cognitive science frames decomposition as enforcing deliberative processing (Kahneman, 2011), managing cognitive load (Sweller, 1988), and implementing satisficing strategies (Simon, 1956). The philosophical literatureâparticularly GonzĂĄlez Arocha's phenomenological analysis, the "stochastic parrots" critique (Bender et al., 2021), and Shanahan's (2024) analysis of LLM metaphorsâreveals tensions between interpreting the shift as communicative reconfiguration and recognising the limits of attributing genuine understanding to statistical models. Indigenous relational epistemology (Wilson, 2008; Smith, 2021; Kovach, 2021) offers a parallel but qualitatively different framework that challenges extractive assumptions in AI knowledge production while raising critical questions about whether "relational" framings can be appropriately applied to systems built on extractive data practices. We present counter-evidence demonstrating that structured imperative systems currently dominate benchmarks, identify critical gaps, and propose twelve research questions spanning the intersections of these domains.
1. Introduction: The Problem Space
When a user types "Summarize this article" into a large language model interface, they perform an act that appears straightforwardly technical: issuing a command to a computational system. When the same user instead types "What are the key arguments this article makes, and where might they be vulnerable?", something different occursânot merely in the expected output, but in the communicative and epistemological character of the interaction. The first prompt commands execution; the second invites inquiry. Whether this distinction has measurable consequences for output quality, and whether it reflects a deeper transformation in human-AI interaction, are open empirical questions that this survey examines.
This survey traces the evolution of prompt decompositionâthe process by which complex tasks are broken into manageable sub-tasks for large language modelsâfrom instruction-based to inquiry-based paradigms. We argue that this evolution, visible across technical, linguistic, cognitive, and philosophical literatures, suggests a trajectory worth theorising even as its empirical basis remains thin. The interrogative turn, as we define it here, is an interpretive framework supported by converging evidence across multiple levels of analysis, though whether it describes a robust phenomenon or an artefact of observer bias requires the kind of empirical investigation we call for in our research questions.
The problem space is inherently interdisciplinary. Technical researchers have documented the progression from static prompt templates through chain-of-thought reasoning (Wei et al., 2022), automated prompt engineering (Zhou et al., 2023b), and decomposed prompting (Khot et al., 2023) toward conversational agentic systems that decompose tasks through dialogue (Li et al., 2023; Wu et al., 2023). This entire paradigm rests on the instruction-following capabilities produced by reinforcement learning from human feedback (RLHF)âInstructGPT (Ouyang et al., 2022) demonstrated that alignment with human intent, not scale alone, produces the instruction-following behaviour that prompt engineering exploits. Computational linguists have begun analysing prompts as discourse units with rhetorical structure (Mann & Thompson, 1988; Zeldes et al., 2025), speech act properties (Gordon, 2024), and pragmatic dimensions (Krause & Vossen, 2024). Cognitive scientists provide models of why decomposition aids human reasoning through dual process theory (Kahneman, 2011) and cognitive load frameworks (Sweller, 1988). Philosophers have recognised that the mode of prompting reconfigures the human-technology relation (Ihde, 1990; GonzĂĄlez Arocha, 2025) and raises questions about epistemic agency (Floridi, 2025), genuine dialogue (Buber, 1923; Bakhtin, 1929/1963), and the limits of attributing understanding to language models (Bender et al., 2021; Shanahan, 2024).
1.1 Search Methodology
This survey was produced through a structured multi-agent research protocol in which four disciplinary survey agents independently searched overlapping literatures. Search targets included Semantic Scholar, Google Scholar, ACM Digital Library, PhilPapers, arXiv, and DBLP. Search terms included "prompt engineering," "prompt decomposition," "automated prompt optimisation," "speech acts AND LLM," "Gricean pragmatics AND NLP," "philosophy of prompting," "Indigenous AI," "cognitive load AND language models," "dual process theory AND AI," and variants. Inclusion criteria: published 2020â2026, English-language, peer-reviewed or preprints at recognised venues (arXiv, PhilArchive). Grey literature (corporate blogs, technical guides) was included where it introduced concepts not yet represented in the peer-reviewed literature but is explicitly flagged throughout. Initial screening yielded approximately 300 sources; 130+ were retained after relevance assessment. The survey protocol included no explicit counter-thesis agentâan acknowledged limitationâthough Section 9 presents counter-evidence as a partial corrective.
1.2 Focal Work Justification
GonzĂĄlez Arocha's (2025) "Critical Phenomenology of Prompting in Artificial Intelligence," published in Sophia, serves as one focal lens for this review. We acknowledge that Sophia (Universidad PolitĂŠcnica Salesiana, Ecuador) is a regional journal with limited international visibility, and the paper has not yet accumulated citations or scholarly responses. The selection is justified because GonzĂĄlez Arocha's work is, to our knowledge, the only published treatment that positions prompting as an inherently philosophical practiceânot a technical task to which philosophical reflection is subsequently applied, but an act that already carries ontological, epistemological, and ethical commitments. This distinctive claim, whether or not it is ultimately accepted by the philosophical community, provides a productive focal point for interdisciplinary analysis.
To balance philosophical ambition with empirical grounding, we elevate Krause and Vossen's (2024) "The Gricean Maxims in NLPâA Survey" (INLG 2024) as a complementary focal work. Published at a peer-reviewed NLP venue, Krause and Vossen provide the most systematic current mapping between pragmatic theory and NLP practice, offering the linguistic precision that GonzĂĄlez Arocha's philosophical framework needs. Together, these focal works anchor the survey's two methodological commitments: philosophical depth and empirical testability.
1.3 Structure
The survey proceeds as follows. Section 2 provides a glossary of key terms. Section 3 establishes the technical foundations. Section 4 analyses the linguistic dimensions. Section 5 introduces the cognitive science perspective. Section 6 develops the philosophical analysis. Sections 7 and 8 identify cross-disciplinary convergences and tensions. Section 9 presents counter-evidence. Section 10 examines Indigenous relational epistemology. Section 11 proposes research questions. Section 12 maps the research landscape. Section 13 concludes.
2. Glossary of Key Terms
Algorithmic monologism. The production of an appearance of multiple voices or perspectives by a single optimising mechanism. Coined here to describe multi-agent LLM systems (e.g., CAMEL, AutoGen) that simulate dialogue without achieving what Bakhtin (1929/1963) called genuine polyphonyâirreducible, autonomous consciousnesses in dialogue.
Interrogative turn. The proposed shift in prompt engineering practice from imperative commands ("Summarize X") toward interrogative framings ("What are the key points of X?"). Used in this survey as an interpretive framework for organising converging evidence, not as a confirmed empirical phenomenon.
Context engineering. The practice of optimising the entire context windowâincluding prompt text, conversation history, retrieved documents, tool outputs, and system stateârather than isolated prompt strings. Introduced by Anthropic (2025; non-peer-reviewed blog post).
Illocutionary force. The communicative function of an utterance (e.g., commanding, questioning, asserting), as distinct from its literal content. From speech act theory (Austin, 1962; Searle, 1969).
Compositionality gap. The empirical finding that language models can correctly answer individual sub-questions but fail to compose those answers into a correct response to the original complex question (Press et al., 2023).
Relational epistemology. An epistemic framework, articulated by Wilson (2008) within Indigenous Cree tradition, in which knowledge is fundamentally relationalâproduced, validated, and shared within networks of accountability that include human, more-than-human, and spiritual relations.
CARE Principles. Collective Benefit, Authority to Control, Responsibility, Ethicsâprinciples for Indigenous Data Governance (Carroll et al., 2020), designed to complement the FAIR principles by centring Indigenous peoples' rights and interests.
Reflexive prompt engineering. Djeffal's (2025) framework in which prompt design incorporates ongoing ethical reflection as a constitutive element rather than an afterthought. Five components: prompt design, system selection, system configuration, performance evaluation, and prompt management.
Epistemic agency. The capacity to participate meaningfully in knowledge-production processes with responsibility for one's epistemic contributions. Floridi (2025) distinguishes AI as exercising "agency without intelligence"âparticipating in knowledge production without epistemic responsibility.
3. The Technical Evolution of Prompt Decomposition
The technical literature documents a rapid evolution through at least five phases, each progressively moving toward more dynamic and adaptive modes of interaction.
3.1 From Static Templates to Structured Reasoning (Pre-2023)
The earliest prompt engineering relied on hand-crafted templates and few-shot examples (Brown et al., 2020). AutoPrompt (Shin et al., 2020) introduced gradient-guided discrete token search, and prefix tuning (Li & Liang, 2021) introduced continuous prompt embeddings. Throughout this phase, prompts functioned as static parameters. The entire paradigm was made possible by a critical technical development: Ouyang et al.'s (2022) InstructGPT demonstrated that RLHFâreinforcement learning from human feedbackâcould train language models to follow instructions reliably. This alignment work, building on foundations by Christiano et al. (2017), is the technical substrate on which the entire prompt engineering field rests. Without RLHF, models would not reliably respond to either imperative or interrogative prompts.
Chain-of-thought (CoT) prompting (Wei et al., 2022) was pivotal: prompt structure, not merely content, could fundamentally alter model capability. Least-to-Most prompting (Zhou et al., 2023a) achieved 99% accuracy on SCAN versus 16% for standard CoT. Plan-and-Solve prompting (Wang et al., 2023a) introduced explicit planning phases. These methods established that decomposition dramatically improves LLM performance.
3.2 Automated Prompt Optimisation and Programmatic Compilation (2023â2024)
The year 2023 witnessed an explosion of automated prompt engineering (APE) methods. The foundational APE paper (Zhou et al., 2023b) demonstrated that LLMs could generate and select prompts matching or exceeding human performance, establishing the generate-evaluate-select pipeline that subsequent work has elaborated. OPRO (Yang et al., 2024) used meta-prompts with trajectory histories, achieving substantial improvements over human-designed prompts. PromptBreeder (Fernando et al., 2024) introduced self-referential evolution, where both task-prompts and mutation-prompts evolved simultaneouslyâthe system learning to improve its own improvement process. EvoPrompt (Guo et al., 2024) applied genetic algorithms and differential evolution to prompt spaces. PromptAgent (Wang et al., 2024) employed Monte Carlo Tree Search with error-reflection loops representing arguably the closest existing approach to conversational optimisation, as the agent reflects on failures and adjusts strategies.
DSPy (Khattab et al., 2023, 2024a) represents a paradigm shift within this phase: treating prompt engineering as compilation of declarative Python programs, DSPy achieves accuracy improvements over manual prompting and decouples prompt quality from specific model versions. The BetterTogether variant (Khattab et al., 2024b) combines prompt optimisation with weight fine-tuning for additional gains. However, DSPy's optimisation occurs at compile-time, not run-timeâa critical limitation for conversational paradigms. From the perspective of the interrogative turn thesis, DSPy is significant as a counter-example: the most influential recent framework moves away from natural-language prompting entirely, toward programmatic compilation.
Robino's (2025) Conversation Routines framework takes a different approach, proposing structured task-oriented dialogue systems as a prompt engineering paradigm. This work is notable for explicitly framing prompt engineering as dialogue design, providing a bridge between the technical APE literature and conversational interaction design.
3.3 Reasoning Topologies and Dynamic Context (2024â2025)
The progression from CoT to Tree of Thoughts (Yao et al., 2023) to Graph of Thoughts (Besta et al., 2024) represents evolution from linear to graph-structured reasoning. TextGrad (Yuksekgonul et al., 2024/2025), published in Nature, extended automatic differentiation to text, enabling backpropagation-like optimisation through natural-language feedback. GEPA (Databricks/UC Berkeley, 2025 [preprint; extraordinary performance claims require independent verification]) demonstrated evolutionary prompt optimisation with LLM-driven reflection.
Concurrently, Anthropic's corporate blog (2025; non-peer-reviewed) introduced "context engineering," arguing that the optimisation target should be the entire context window. IBM's technical guide (2026; non-peer-reviewed) elaborated similar ideas. These represent the field's recognition that prompts exist within dynamic contexts, though the claims originate in grey literature and await peer-reviewed validation.
3.4 Conversational Decomposition (2025â2026)
The most recent phase witnesses systems that decompose tasks through dialogue. Decomposed Prompting (Khot et al., 2023) established that modular decomposition consistently outperforms monolithic strategies. Multi-agent architectures introduced decomposition through inter-agent conversation: CAMEL (Li et al., 2023) pioneered "inception prompting"; AutoGen (Wu et al., 2023) adopted a conversation-centric architecture; ChatDev (Qian et al., 2024) introduced "communicative dehallucination."
By 2025, inquiry-oriented systems emerged: ACT (Google, 2025 [preprint]) trains agents to ask clarifying questions; FATA (2025 [preprint]) generates clarification checklists before answering; the Tri-Agent Evaluation Framework (KDD 2025 [preprint]) measures decomposition quality through dialogue quality. It is important to note that all three are recent preprints that have not yet demonstrated benchmark superiority over structured approachesâa point we develop in Section 9.
3.5 Worked Example: Linguistic Analysis of Prompt Pairs
To ground the theoretical frameworks in concrete analysis, consider this prompt pair:
Imperative: "Summarize the key findings of Smith et al. (2024) in three bullet points, focusing on methodology and results."
- RST analysis: Nucleus = "Summarize key findings"; Satellites: "of Smith et al. (2024)" (Elaboration), "in three bullet points" (Manner), "focusing on methodology and results" (Condition).
- Speech act analysis (Searle): Directive; illocutionary force = command; preparatory condition = hearer can perform the action; sincerity condition = speaker wants it done.
- Gricean analysis: Explicitly encodes Quantity (three points), Relation (methodology and results), Manner (bullet format). Leaves little to respondent's judgment.
Interrogative: "What are the most important findings from Smith et al. (2024), and how robust is their methodology?"
- RST analysis: Two coordinated nuclei linked by Joint relation; implicit satellite (the paper context) via Background relation.
- Speech act analysis (Searle): Question (dual); illocutionary force = request for information; preparatory condition = hearer knows the answer; creates an open answer set.
- Gricean analysis: Does NOT pre-specify Quantity or Manner; relies on respondent's judgment about relevance; invites conversational implicature.
- Question semantics (Hamblin, 1973): Denotes a partition of logical space into possible answers. "Most important" creates an evaluative sub-question requiring judgment; "how robust" creates an analytic sub-question requiring methodological assessment.
The imperative prompt constrains the response space through explicit encoding of cooperative maxims; the interrogative prompt opens it by delegating Quantity and Manner decisions to the respondent. Whether this openness produces better outputs is an empirical questionâLeidner and Plachouras (2023) demonstrated that linguistic form does not reliably predict output qualityâbut the structural difference is linguistically real and analytically significant.
4. The Linguistic Lens
4.1 Rhetorical Structure Theory and Prompt Architecture
Mann and Thompson's (1988) Rhetorical Structure Theory (RST) provides a framework for analysing prompts as hierarchically organised discourse. The core instruction functions as the nucleus; contextual elementsârole specifications, constraints, examplesâserve as satellites connected by coherence relations (Elaboration, Background, Condition, Manner). Zeldes et al.'s (2025) Enhanced RST (eRST) extends this to graph-based representations supporting non-projective and concurrent discourse relations, directly applicable to multi-component and multi-turn prompts where relations cross turn boundaries. Decomposing a complex prompt can be understood, in RST terms, as flattening a deep rhetorical tree into a sequence of simpler nucleus-satellite pairsâeach expressible as a single coherent instruction or question. The worked example in Section 3.5 demonstrates this concretely: the imperative prompt exhibits a single nucleus with three satellite relations, while the interrogative prompt exhibits two coordinated nuclei with an implicit background satelliteâa structurally different discourse architecture with different compositional properties.
4.2 Speech Act Theory and Illocutionary Force
Gordon's (2024) "Speech Acts and Large Language Models" provides the most systematic application of Austin-Searle speech act theory to LLM interaction, introducing "conversational zombies"âentities producing utterances with perlocutionary effects (persuading, informing, misleading) while lacking the intentionality required for genuine illocutionary force. This concept is analytically powerful: it captures the asymmetry of human-LLM interaction in which one party performs genuine speech acts and the other produces speech-act-shaped outputs without the mental states those acts conventionally require. Gubelmann (2024) deepens this from a Kantian-pragmatist perspective, arguing that LLMs cannot perform genuine speech acts because they lack the autonomous agency that speech act theory presupposes. Markl (2025) demonstrates a productive application, showing that speech act theory can taxonomise representational harms in LLM output as perlocutionary effectsâreal impacts on audiencesâwithout requiring the attribution of illocutionary intention to the model.
The shift from imperative to interrogative prompting is a shift in illocutionary force from directives to questions, with fundamentally changed preparatory conditions. Directives presuppose the hearer can perform the action and the speaker has authority to request it; questions presuppose a knowledge asymmetry and the existence of an answer. For prompt decomposition, this has structural consequences: imperative decomposition yields a sequence of sub-commands (do A, then B, then C), while interrogative decomposition yields a tree of sub-questions whose answers compose hierarchicallyâa structure directly modelled by formal question semantics.
4.3 Formal Question Semantics
Hamblin's (1973) treatment of questions as denoting sets of possible answers, and Groenendijk and Stokhof's (1984) partition semanticsâwhere a question partitions logical space into equivalence classes corresponding to possible complete answersâprovide formal machinery for understanding what interrogative prompts mean in ways that imperative semantics cannot capture. When a prompt shifts from "Summarize this text" to "What are the key points of this text?", the semantic object changes from a command with a single expected execution to a question with a structured set of acceptable answers. The interrogative mode thus provides a richer semantic framework for structured reasoningâand directly mirrors the tree-search structures implemented computationally by ToT (Yao et al., 2023) and GoT (Besta et al., 2024).
The self-ask method (Press et al., 2023) makes this connection explicit: the model generates and answers its own follow-up sub-questions, outperforming imperative linear reasoning. Press et al. identify the "compositionality gap"âLLMs can answer sub-questions correctly but fail to compose the answersâdemonstrating that interrogative self-decomposition is not merely a stylistic preference but addresses a structural limitation. Roberts's (2012) Questions Under Discussion (QUD) framework offers additional analytical resources: a discourse is organised around implicit or explicit questions, with each assertion evaluated for its relevance to the current QUD. Prompt decomposition, in this framework, is the explicit articulation of QUDs that the model must address sequentially.
4.4 Gricean Pragmatics and Beyond
Krause and Vossen's (2024) comprehensive survey maps Gricean maxims to NLP tasks, revealing systematic patterns of maxim violation in LLM interaction: Quantity violations (over- or under-informing), Quality violations (hallucination), Relation violations (irrelevant tangents), Manner violations (ambiguous output). The interrogative turn adds a pragmatic dimension: questions trigger different implicature patterns than commands, positioning interaction as knowledge-sharing rather than command-execution.
Sperber and Wilson's (1986/1995) Relevance Theory offers a potentially more applicable alternative to Gricean pragmatics for LLM interaction. Where Grice's model assumes cooperative agents following conversational maxims, Relevance Theory provides a cognitive model based on relevance maximisation: communicators aim to achieve the greatest cognitive effect for the least processing effort. This framework may better describe human-LLM interaction because it does not require attributing cooperative intentions to the modelâit requires only that the model's outputs be relevant to the human's cognitive environment. The distinction matters: Gricean analysis presupposes rational agents following social norms; relevance-theoretic analysis focuses on information processing and cognitive effects, which may be more appropriate for interaction with statistical systems.
4.5 Formal and Functional Competence
Mahowald et al. (2024) provide a critical distinction for evaluating linguistic claims about LLMs: the dissociation between formal linguistic competence (phonology, morphology, syntax, word meaning) and functional linguistic competence (reasoning, world knowledge, social cognition, theory of mind). Their review of the evidence suggests LLMs show strong formal competence but systematically lack functional competence. This distinction is essential for the interrogative turn thesis: even if LLMs process interrogative prompts through sophisticated formal mechanisms, they may lack the functional competenceâthe understanding of communicative intent, the capacity for genuine pragmatic reasoningâthat would make "dialogue" or "inquiry" appropriate descriptions of their processing. Hu et al. (2025) confirm this empirically, demonstrating that LLMs underperform on pragmatic phenomena including conversational implicature and presupposition accommodation.
4.6 Information-Theoretic Foundations and Empirical Prompt Linguistics
Zhang and Cao's (2025) ACL analysis demonstrates that prompts function as information selectors, determining which slice of the model's representation gets verbalised. Combined with Ivison et al.'s (2024) finding that instruction tuning shifts attention to instruction verbs and semantic structure, we see that the linguistic form of prompts has measurable computational consequences.
Leidner and Plachouras (2023) established empirically that neither naturalness nor lower perplexity reliably predicts prompt effectiveness. Ma et al.'s (2024) large-scale analysis of 10,538 prompts identified eight structural components and tracked their evolution. A critical gap concerns cross-linguistic analysis: virtually all prompt linguistics has been conducted on English-language prompts.
5. The Cognitive Science Lens: Decomposition and Human Reasoning
The cognitive science literature, largely absent from prior treatments of prompt engineering, provides frameworks that ground the interrogative turn thesis in models of human cognition.
5.1 Dual Process Theory
Kahneman's (2011) dual process framework distinguishes System 1 (fast, automatic, heuristic) from System 2 (slow, deliberative, analytical) cognitive processing. Prompt decomposition can be understood as a strategy that enforces System 2 processing on what would otherwise be System 1 model responses. Chain-of-thought prompting (Wei et al., 2022) literally externalises deliberation: by requiring the model to produce intermediate reasoning steps, it forces sequential, step-by-step processing rather than direct pattern-matching. The interrogative form strengthens this effectâquestions demand evaluation and assessment, while commands permit rote execution.
This framing suggests that the effectiveness of CoT and decomposition may derive not from any intrinsic superiority of "conversation" but from the computational analogue of forcing deliberative processing: slowing down inference, introducing intermediate representations, and requiring explicit justification of reasoning steps. The cognitive science perspective thus provides a deflationary alternative to the philosophical interpretation: decomposition works because it exploits the architecture of attention-based models in ways analogous to how deliberative thinking exploits human cognitive architecture.
5.2 Cognitive Load Theory
Sweller's (1988; Sweller et al., 2011) cognitive load theory distinguishes three types of cognitive load: intrinsic (inherent task complexity), extraneous (load imposed by poor instruction design), and germane (load contributing to schema acquisition). Applied to prompt engineering, a complex monolithic prompt imposes high intrinsic load on both the model (dense context to process) and the human (complex specification to formulate). Decomposition reduces intrinsic load by segmenting information into manageable chunks, while well-designed decomposition strategies minimise extraneous load by presenting sub-tasks in a coherent, well-structured sequence.
The interrogative form may further manage cognitive load by distributing the specification burden: instead of the human encoding all task parameters upfront (high extraneous load on the formulator), a question invites the model to identify what is relevant, shifting some of the information-retrieval and relevance-assessment work to the model. Whether this shift improves outcomes depends on the model's capacity for genuine relevance assessmentâa capacity that Mahowald et al.'s (2024) formal/functional competence distinction gives us reason to question.
5.3 Bounded Rationality and Satisficing
Simon's (1956; 1996) bounded rationality framework suggests that agents with limited cognitive resources do not optimise but satisficeâthey seek solutions that are "good enough" rather than optimal. Prompt decomposition is, in Simon's terms, a satisficing strategy: it produces acceptable sub-solutions rather than attempting to optimise the whole. The compositionality gap (Press et al., 2023) can be reframed as a bounded rationality problem: models can satisfice on sub-problems but fail to integrate sub-solutions into globally optimal compositions. Decomposition accommodates this limitation by breaking integration demands into manageable steps.
This perspective moderates grand claims about the interrogative turn: decomposition may be less about achieving richer epistemic engagement and more about accommodating the cognitive limitations of both humans and language modelsâa practical strategy for bounded agents, not a philosophical transformation. The cognitive science lens thus provides a deflationary counterweight to the philosophical lens: where GonzĂĄlez Arocha sees the prompt as a mediating space charged with philosophical significance, Simon would see it as a practical tool for managing bounded rationality. Both perspectives have merit; the tension between them is productive rather than resolvable.
5.4 Implications for the Interrogative Turn
The cognitive science perspective provides both support for and challenges to the interrogative turn thesis. Supporting evidence: questions may be more effective than commands at triggering deliberative processing because they demand evaluation rather than execution. Challenging evidence: the benefits of decomposition may derive entirely from structural properties (breaking complex tasks into simpler ones) rather than from the illocutionary mode (questions vs. commands). If cognitive load reduction is the primary mechanism, the interrogative form is incidentalâany well-structured decomposition should work equally well regardless of whether sub-tasks are phrased as questions or commands. This is an empirical question that RQ-A and RQ-F are designed to address.
6. The Philosophical Lens
6.1 GonzĂĄlez Arocha's Critical Phenomenology of Prompting
GonzĂĄlez Arocha's (2025) paper in Sophia constitutes the most direct philosophical treatment of prompting as philosophical practice. GonzĂĄlez Arocha argues that prompts are discursive practices embedding assumptions, worldviews, and power relations. The prompt functions as a "mediating space" where human intentionality, language, and sociopolitical structures converge.
At the phenomenological level, the mode of prompting reconfigures the human-AI relationship: commands produce a tool-relation; questions produce something approaching an alterity-relation. At the critical level, prompt design is a site of powerâwho designs prompts, whose assumptions are encoded, and whose epistemologies are privileged are questions of justice. At the epistemological level, the kind of knowledge produced depends on the communicative structure of the prompt. It should be noted that this is one philosopher's claim, published in a regional journalânot yet a consensus position. Its value lies in providing a theoretically ambitious framework that generates testable hypotheses, not in representing established philosophical consensus.
6.2 The Stochastic Parrots Critique and the Limits of "Dialogue"
The "stochastic parrots" critique (Bender, Gebru, McMillan-Major & Shmitchell, 2021) is the most-cited critical perspective on LLM capabilities and must be confronted directly by any claim about "machine dialogue" or "inquiry." Bender et al. argue that LLMs are sophisticated statistical models that generate plausible text by pattern-matching over training data without any understanding of meaning. If this characterisation is correctâand substantial evidence supports it (Mahowald et al., 2024; Bender & Koller, 2020)âthen claims about "the interrogative turn" are potentially misleading: there is no genuine inquiry when one interlocutor has no capacity for understanding.
The interrogative turn thesis must confront this head-on. Even if interrogative prompts produce better outputs than imperative ones, this may reflect the information-theoretic properties of questionsâwider answer sets, compositional structure, implicit relevance constraints (Sperber & Wilson, 1986/1995)ârather than any genuine communicative engagement. The thesis is strongest when interpreted as a claim about the human side of the interaction: how framing interaction as inquiry shapes human epistemic agency, critical evaluation, and interpretive engagementârather than about machine capabilities. The question is not whether the model "inquires" but whether the human, by adopting an inquiring stance, produces better epistemic outcomes.
Shanahan (2024) reinforces this point with precision. In "Talking About Large Language Models," he argues that we should be scrupulously careful about the metaphors we use for LLMsâavoiding attributions of belief, understanding, or communicative intent unless we are explicit that these are useful fictions (in Dennett's (1987) "intentional stance" sense). "Conversation," "dialogue," and "inquiry" are such metaphors. This survey uses them as analytical frameworks for describing interaction patterns, not as claims about machine mentality.
6.3 Wittgensteinian Language Games and the Socratic Tradition
The later Wittgenstein's (1953) concept of language games provides a framework for understanding the shift. Prompts are moves within language gamesârule-governed, context-dependent communicative practices embedded in "forms of life." The shift from imperative to interrogative prompting is a shift between language games with different rules: in the command game, success equals accurate execution; in the inquiry game, success equals productive exploration. Recent work applying Wittgenstein to AI (Jolma, 2024; STRV, 2024 [blog post]) suggests that LLMs participate in these games statistically but lack the shared form of life that grounds genuine meaning.
The Socratic tradition deepens the philosophical significance of the interrogative mode. Socratic questioning presupposes a co-inquirer capable of genuine aporiaâperplexity, the recognition of one's own ignorance that drives the search for knowledge. The operationalisation of Socratic methods in LLM interaction (Chang et al., 2023) reveals a productive tension: LLMs can simulate the role of co-inquirer but cannot experience the aporia that makes Socratic questioning transformative. Yet the structure matters: when humans adopt a Socratic stance toward AIâasking probing questions, identifying contradictions, pursuing implicationsâthey position themselves as active epistemic agents rather than passive consumers. The Socratic tradition thus provides the deepest philosophical justification for the inquiry paradigm even as it reveals its limits: the benefit accrues to the human questioner, not to the machine "respondent."
6.4 Dialogical Philosophy and Its Limits
Buber's (1923) I-Thou/I-It framework reveals that current AI interactions remain fundamentally I-It (Hasse, 2017; Aguas, 2025). Conversational prompting may create conditions that approach I-Thou dynamicsânot because the machine becomes a genuine Thou, but because the human's orientation shifts toward openness. Bakhtin's (1929/1963) concept of polyphony introduces a critical counter: genuine polyphony requires irreducible, autonomous consciousnesses. LLM multi-voice outputs are "algorithmic monologism"âthe appearance of multiple voices produced by a single optimising mechanism (SciELO, 2025 [blog post]). This critique applies equally to multi-agent systems: CAMEL's agents (Li et al., 2023) produce the form of dialogue without the substance of genuine otherness.
6.5 Situated Action and the Decomposition Problem
Suchman's (2007) Human-Machine Reconfigurations challenges the plan-then-execute model that underlies most prompt decomposition systems. Suchman's situated action framework argues that human actions are fundamentally situatedâresponsive to the specifics of the moment rather than executions of pre-formulated plans. The decomposition paradigm assumes that a complex task can be decomposed into sub-tasks in advanceâan assumption Suchman's work suggests is problematic for many real-world tasks. The interrogative turn may partially address this concern: conversational decomposition allows plans to be revised mid-execution in response to intermediate results. But the deeper challenge remains: if effective action is fundamentally situated, no amount of pre-decompositionâwhether imperative or interrogativeâcan substitute for genuine contextual responsiveness. This tension connects to Dreyfus's (1972, 1992, 2007) critique that AI fails precisely where holistic, context-sensitive understanding is required.
6.6 Hermeneutics and Postphenomenology
Gadamer's (1960) hermeneutic philosophy offers the concept of the "hermeneutic circle"âunderstanding proceeds through iterative movement between part and whole, where pre-understanding (Vorverständnis) is revised through encounter with the text. Iterative prompting structurally resembles this circle: each exchange revises the human's understanding of both the task and the model's capabilities. However, Gadamer's hermeneutics presupposes a shared "effective-historical consciousness" (wirkungsgeschichtliches BewuĂtsein) and the possibility of "horizon fusion" (Horizontverschmelzung) between interpreter and text. Whether these concepts meaningfully apply to human-LLM interactionâwhere the "text" is generated by statistical pattern-matchingâis a genuine philosophical question, not an obvious extension.
Ihde's (1990) postphenomenological taxonomy provides vocabulary for different human-technology relations. Under imperative prompting, AI occupies a hermeneutic or embodiment relation (tool-like). Under interrogative prompting, it may shift toward an alterity relationâappearing as a quasi-other. GonzĂĄlez Arocha (2025) builds on this to argue that the mode of prompting determines the character of the "mediating space."
6.7 Djeffal's Reflexive Prompt Engineering
Djeffal's (2025) framework, presented at FAccT 2025, bridges philosophical theory and practical implementation. His five-component framework is grounded in "responsibility by design," demonstrating that prompt engineering must incorporate ongoing ethical reflection. Together, GonzĂĄlez Arocha and Djeffal establish a philosophical-practical continuum: the former demonstrates why prompt design may be a philosophical practice; the latter shows how to conduct it responsibly.
6.8 Epistemic Agency
Floridi's (2025) distinction between "agency without intelligence" and genuine epistemic agency provides the most rigorous epistemological framework. AI systems participate in knowledge-production processes but lack epistemic responsibility. The inquiry paradigm may strengthen human epistemic agency by positioning the human as an active questioner rather than a passive command-issuer. Russo, Schliesser, and Wagemans' (2023) argument for an integrated "epistemology-cum-ethics" of AI supports this: the process of knowledge production, not just its outputs, carries ethical weight.
7. Cross-Disciplinary Convergences
7.1 The Structure of Interaction Matters
All domains recognise that how a human communicates with an AI system is constitutive of the interaction's character. Technically, CoT showed that structure shapes capability (Wei et al., 2022); linguistically, illocutionary force alters the communicative contract (Gordon, 2024); cognitively, decomposition enforces System 2 processing (Kahneman, 2011); philosophically, the mode of prompting shapes the human-technology relation (GonzĂĄlez Arocha, 2025). Zhang and Cao's (2025) information-theoretic analysis bridges these: prompts are information selectors whose form determines computational pathways.
7.2 Decomposition Recapitulates DialogueâUp to a Point
The technical progression from chains to trees to graphs to multi-agent systems mirrors a progression from monologue to dialogue. Linguistically, this corresponds to the shift from imperative sequences to interrogative trees (Hamblin, 1973; Groenendijk & Stokhof, 1984). Cognitively, it mirrors the shift from automatic to deliberative processing (Kahneman, 2011). However, this convergence must be stated carefully: the structural parallel between computational tree-search and dialogical inquiry is suggestive but does not establish that tree-search is dialogue. The isomorphism may be formal rather than substantive (see Section 8, Tension 1). A tree-search algorithm exploring solution branches is not asking questions in any meaningful senseâit is evaluating possibilities through an optimisation procedure. The parallel is that both processes explore structured spaces of alternatives, but the exploration mechanisms are fundamentally different: one involves communicative acts with preparatory conditions and implicature; the other involves numerical evaluation functions and branching heuristics.
7.3 Context-Sensitivity as First Principle
All domains converge on the recognition that context is constitutive, not merely additive. Technically, Anthropic's context engineering paradigm (2025; non-peer-reviewed) and DSPy's dynamic compilation both treat context as a primary optimisation target. Linguistically, Gricean pragmatics and Relevance Theory both ground meaning in contextual interpretationâan utterance's meaning depends on the context in which it occurs (Sperber & Wilson, 1986/1995). Philosophically, GonzĂĄlez Arocha's (2025) "mediating space" and Gadamer's (1960) hermeneutic situation both insist that understanding is always situated. Cognitively, Suchman's (2007) situated action framework argues that effective action is responsive to contextual particulars, not to decontextualised plans. This convergence suggests that prompt engineering's evolution toward richer context management is not merely an engineering improvement but a recognition of what linguistic, philosophical, and cognitive theories have long established: that meaning is context-dependent.
7.4 Reflexivity Across Domains
Djeffal's (2025) reflexive prompt engineering, GonzĂĄlez Arocha's (2025) critical phenomenology, and the technical literature's self-referential optimisation (PromptBreeder; Fernando et al., 2024) all recognise that prompting systems must reflect on their own practices. The cognitive science parallel is metacognitionâthe capacity to monitor and regulate one's own reasoning processes. The linguistic parallel is meta-pragmatic awarenessâthe ability to reflect on the communicative rules one is following. That four independent disciplinary traditions converge on reflexivity as essential suggests it is not merely a desirable property but a structural requirement for effective complex interaction.
8. Cross-Disciplinary Tensions
Tension 1: Is the Interrogative Turn Real or Metaphorical?
The technical evidence is ambiguous. The most successful deployed systems (DSPy, LangChain, MetaGPT) remain overwhelmingly structured and imperative. The systems positioned as "conversational" use dialogue as implementation mechanism, not epistemological stance. The "interrogative turn" may be an interpretive framework that productively organises diverse evidenceâor it may impose coherence on developments that are better understood as heterogeneous engineering improvements. We take the position that it is a useful analytical lens whose empirical status remains to be established. What would falsify the thesis? If controlled experiments show no systematic performance or quality differences between imperative and interrogative decomposition across a range of tasks and models, the thesis would be disconfirmed as an empirical claim (though it might retain value as an analytical or normative framework).
Tension 2: Cooperation Without Cooperators
Gricean pragmatics assumes cooperative agents following shared communicative norms. Can "cooperation" meaningfully describe human-LLM interaction when one party has no intentions, no beliefs, and no awareness of communicative norms? The model produces cooperative-seeming outputs because it was trained on cooperative human discourse (via RLHF; Ouyang et al., 2022), not because it follows cooperative principles. Relevance Theory (Sperber & Wilson, 1986/1995) offers a partial resolution by grounding communication in cognitive effects rather than cooperative intentionsâwhat matters is whether the output is relevant to the human's cognitive environment, not whether the model "intends" to cooperate. But the deeper tensionâbetween treating LLMs as communicative partners and recognising them as statistical systemsâcannot be resolved by theoretical reframing alone. It requires the kind of careful metaphorical discipline that Shanahan (2024) advocates.
Tension 3: The False Equivalence Problem
Several convergences risk conflating structures that are formally similar but substantively different. TextGrad's gradient-like updates and Gadamer's hermeneutic circle are structurally parallel but epistemologically distinct. Grice's cooperative principle and technical fitness functions are different kinds of "cooperation." The analogy between Wilson's relational accountability and conversational prompting risks trivialising Indigenous epistemology (see Section 10). We flag these as analogies, not identities.
Tension 4: Philosophical Unfalsifiability
Claims about "phenomenological reorientation" and "epistemic co-construction" are interpretive, not empirical. They could be applied to any change in AI interaction design. The convergences identified in Section 7 are suggestive, but the philosophical claims require the kind of empirical testing proposed in our research questions (Section 11, particularly RQ1 and RQ7).
9. Counter-Evidence: When Imperative Decomposition Outperforms
Intellectual honesty requires confronting evidence that challenges the interrogative turn thesis. This section presents the strongest case against it.
9.1 Benchmark Dominance of Structured Systems
The most successful deployed prompt decomposition systems are overwhelmingly structured and imperative. DSPy (Khattab et al., 2023, 2024a) compiles declarative programs into optimised prompts without any conversational component. LangChain and LangGraph orchestrate structured pipelines. MetaGPT (Hong et al., 2024) uses Standard Operating Proceduresâdetailed, pre-specified workflowsâto decompose software development tasks into waterfall-style phases. Language Agent Tree Search (LATS; Zhou et al., 2024) achieves state-of-the-art results through Monte Carlo tree search, a mathematical optimisation algorithm. ToolChain* similarly uses tree-search for tool-augmented reasoning. These are optimisation procedures, not dialogues, and they represent the performance frontier.
The highest benchmark scores on SWE-bench (software engineering), AgentBench (general agent tasks), and WebArena (web interaction) are consistently achieved by structured pipeline systems with carefully engineered decomposition strategies. No conversational system has demonstrated benchmark superiority on any of these evaluations. The "inquiry-based" systems cited as evidence for the interrogative turnâACT (Google, 2025), FATA (2025), and the Tri-Agent Framework (KDD 2025)âare all 2025 preprints that have not yet demonstrated superiority on standard benchmarks, have not been independently replicated, and in some cases report results on bespoke evaluation tasks rather than established benchmarks.
9.2 Linguistic Form and Output Quality
Leidner and Plachouras (2023) established empirically that neither naturalness nor lower perplexity reliably predicts prompt effectiveness. The relationship between linguistic form and output quality is task- and model-dependent. This finding directly challenges the core linguistic claim of the interrogative turn thesis: if illocutionary force does not reliably predict output quality, the "shift" from imperative to interrogative may be linguistically interesting but computationally inconsequential.
9.3 "Conversational" Systems as Implementation Mechanism
The systems most frequently cited as evidence for conversational decomposition use dialogue as an implementation mechanism, not an epistemological stance. CAMEL's (Li et al., 2023) role-playing agents exchange messages to coordinate task completionâthe "conversation" is a coordination protocol. AutoGen's (Wu et al., 2023) multi-agent conversations are asynchronous message-passingâa distributed computing paradigm, not a Socratic dialogue. Tree-search algorithms like LATS (Zhou et al., 2024) and ToolChain* are optimisation procedures with branching search, not inquiries. Calling these "conversations" applies a metaphor that may obscure more than it reveals (Shanahan, 2024).
9.4 The Steel-Man Counter-Thesis
The strongest argument against the interrogative turn thesis, adapted from the peer review process that informed this revision: the "shift" may be an artefact of observer bias, not a phenomenon in the field. The authors identified a real progression from static to dynamic systems and interpreted it through the lens of "instruction vs. inquiry"âbut the actual technical evolution is toward greater automation and optimisation, not toward more question-asking. DSPy moves away from natural-language prompting toward programmatic compilation. The performance evidence favours structure, not inquiry. The philosophical arguments, while intellectually stimulating, are interpretive overlays that could be applied to any change in AI interaction design.
9.5 Our Response
We accept much of this counter-evidence. The interrogative turn is best understood as an emerging trajectory visible in the most recent systems, theoretically motivated by linguistic and philosophical analysis, but not yet empirically dominant. Our contribution is not to claim that conversational decomposition has replaced structured approachesâit manifestly has notâbut to identify why the trajectory toward conversational interaction is theoretically significant, what empirical evidence would confirm or disconfirm it, and how the linguistic and philosophical dimensions of prompt design matter regardless of which paradigm ultimately dominates. The research questions in Section 11 are designed to generate precisely the evidence that would adjudicate this debate.
10. The Relational Turn: Indigenous Epistemology
10.1 Framing and Limitations
Before applying Indigenous epistemological frameworks to AI interaction, we must acknowledge fundamental tensions. LLM systems are products of extractive data practices, built on corpora that predominantly represent colonial languages and Western knowledge systems, and deployed by technology corporations whose interests may diverge from Indigenous communities. The "relational" framing of conversational prompting risks legitimising extractive technology through the language of relationalityâprecisely the kind of co-optation that Indigenous scholars have long warned against. We proceed with this awareness, not to instrumentalise Indigenous knowledge as a "lens" for validating Western AI developments, but to take seriously the challenge that Indigenous epistemologies pose to the assumptions embedded in current AI design.
10.2 Indigenous Epistemological Foundations
Wilson's (2008) Research is Ceremony articulates a Cree epistemology in which knowledge is fundamentally relationalâproduced within networks of accountability that include human, more-than-human, and spiritual relations. Wilson's relational accountability involves dimensions that cannot simply be mapped onto human-LLM interaction: it encompasses spiritual relations, ancestral obligations, and reciprocal responsibilities to land and community that have no analogue in prompt engineering.
Linda Tuhiwai Smith's (2021) Decolonizing Methodologies, now in its third edition, provides the foundational critique of Western research as complicit in colonialism. Smith argues that the Western research paradigmâits assumptions of objectivity, extractability, and individual ownership of knowledgeâhas been an instrument of colonial power. This critique applies directly to AI: systems designed to extract and recombine knowledge according to Western epistemological assumptions reproduce these power dynamics at computational scale.
Margaret Kovach's (2009/2021) Indigenous Methodologies emphasises storytelling as epistemological framework and relational accountability as research ethics. Kovach argues that Indigenous research methodologies centre relationshipâbetween researcher and participants, between knowledge and community, between inquiry and responsibility. The conversational form of inquiry resonates with storytelling traditions, but the resonance must not be mistaken for identity: Indigenous storytelling occurs within webs of accountability and spiritual connection that commercial AI systems neither possess nor honour.
Leroy Little Bear (2000) contrasts Western linear, compartmentalised thought with Indigenous relational, holistic, and cyclical worldviews. In Blackfoot metaphysics, reality is characterised by constant flux and interconnectednessâa perspective that challenges the decomposition paradigm itself. From this worldview, breaking a complex task into independent sub-tasks may be not merely a simplification but a distortion, because the relationships between parts are constitutive of meaning. This philosophical challenge applies to both imperative and interrogative decomposition.
10.3 Indigenous AI in Practice
Several initiatives demonstrate what Indigenous-led AI development looks like in practice.
FLAIR (First Languages AI Reality) was founded by Michael Running Wolf (Northern Cheyenne) and is based at Mila-Quebec AI Institute. FLAIR develops automatic speech recognition (ASR) for endangered Indigenous languages, employing low-resource AI techniques adapted to languages with limited digital text corpora. The initiative creates tools for language learning, transcription, and voice-controlled applications in Indigenous languages, with community data sovereignty as a foundational principle. FLAIR demonstrates that AI can serve Indigenous language revitalisation when designed by and for Indigenous communities.
Abundant Intelligences is a six-year research programme funded by the New Frontiers in Research Fund (NFRF) at Concordia University, co-directed by Jason Edward Lewis (Hawaiian/Samoan). The programme involves eight universities and twelve Indigenous community organisations across Canada, the USA, and New Zealand. Organised in regional "pods" collaborating with local communities, the programme develops AI models reflecting Indigenous valuesâexploring what "intelligence" means from diverse Indigenous perspectives. The programme has published in AI & Society (2024), establishing a scholarly record for community-centred AI development.
10.4 The IP//AI Position Paper
The Indigenous Protocol and Artificial Intelligence Position Paper (Lewis et al., 2020) advances six interconnected arguments that challenge mainstream AI development: (1) Indigenous protocols should be integrated into AI design and governanceânot as afterthought "ethics checks" but as foundational design principles; (2) Indigenous data sovereignty must be respectedâcommunities retain authority over their data, including the right to refuse participation in training datasets; (3) AI should be understood as relational rather than value-neutralâit is always embedded in social, cultural, and political contexts; (4) Indigenous ethical frameworks should inform AI development, drawing on millennia of relational ethics rather than relying solely on Western consequentialist and deontological traditions; (5) the concentration of AI power in a small number of corporations and nations must be challenged through distributed, community-controlled development; (6) AI should be designed for Indigenous benefit, not merely to avoid Indigenous harmâa distinction between passive non-maleficence and active beneficence. These arguments extend far beyond the conversational prompting question to challenge the power structures within which all AI development occurs.
10.5 The CARE Principles Applied
The CARE Principles for Indigenous Data Governance (Carroll et al., 2020) translate relational philosophy into actionable frameworks. Applied to AI interaction design: Collective Benefit demands that systems serve communities, not just individual users or corporate shareholders; Authority to Control requires that communities retain agency over how AI engages their knowledgeâincluding the right to exclude certain knowledge domains from AI processing; Responsibility demands ongoing accountability from designers, not one-time ethical review; Ethics requires honouring relational frameworks rather than imposing extractive ones. The CARE Principles were developed for Indigenous data governance specifically; extending them to general AI interaction design is suggestive but requires substantial argumentative work. The principles point toward a design paradigm fundamentally different from user-centred designâone centred on community, relation, and accountability rather than individual efficiency.
10.6 Parallels and Differences
Indigenous and Western knowledge systems are contemporaneous and qualitatively different. The parallels between conversational prompting and Indigenous relational epistemology are suggestive but the epistemological foundations differ profoundly. Wilson's relational accountability concerns networks that include spiritual beings, ancestors, land, and non-human beingsâa metaphysical framework fundamentally different from information exchange between a human and a language model. The convergence is not that Western AI is "returning" to Indigenous ways of knowing (which would imply Indigenous knowledge is a temporally prior version that Western technology eventually rediscoversâa sophisticated form of the "noble savage" trope). Rather, both traditions, from their different positions, recognise that extractive, decontextualised approaches to knowledge are impoverished. Indigenous epistemologies have articulated this for millennia; Western prompt engineering is discovering it through engineering failuresâthrough the compositionality gap, context collapse, and hallucination cascades that result from treating knowledge as context-free.
The critical question is whether recognising these parallels leads to genuine engagement with Indigenous epistemologiesâcentring Indigenous voices, respecting data sovereignty, supporting community-led AI developmentâor whether it becomes another form of intellectual extraction, mining Indigenous concepts for their rhetorical value while leaving colonial power structures intact.
11. Gap Analysis and Research Questions
11.1 Identified Gaps
Gap 1: No Integrated Framework. Despite convergences, no published work integrates technical, linguistic, cognitive, and philosophical analyses of prompt decomposition into a unified framework. GonzĂĄlez Arocha (2025) comes closest by treating prompts as simultaneously technical, discursive, and philosophical objects, but does not engage the technical APE or computational linguistics literatures. Djeffal (2025) bridges philosophy and practice but does not address decomposition specifically. Krause and Vossen (2024) bridge linguistics and NLP but do not engage philosophy. The field lacks a framework that simultaneously accounts for the information-theoretic properties of prompts (Zhang & Cao, 2025), their speech act structure (Gordon, 2024), their cognitive load implications (Sweller, 1988), and their phenomenological significance (GonzĂĄlez Arocha, 2025).
Gap 2: Conversational Prompt Optimisation. No APE method treats optimisation as an ongoing conversation. All current methods produce static prompt artefacts, even when the optimisation process is iterative. The linguistic and philosophical literatures suggest this gap exists partly because the field lacks the theoretical vocabulary for conversational optimisationâa vocabulary that speech act theory, question semantics, and dialogical philosophy could provide.
Gap 3: Decomposition Quality Metrics. Existing benchmarks (SWE-bench, GAIA, WebArena, AgentBench) evaluate task completion but never decomposition quality per se. A system producing an elegant, minimal decomposition scores identically to one using wasteful redundancyâprovided both succeed. Linguistic analysis suggests decomposition quality might be measured through discourse coherence (Hobbs, 1979; Asher & Lascarides, 2003) or rhetorical structure well-formedness (Mann & Thompson, 1988). Philosophical analysis suggests quality might include reflexive adequacy (Djeffal, 2025) and relational appropriateness (Wilson, 2008).
Gap 4: Cross-Cultural and Non-Western Perspectives. Both the linguistic and philosophical literatures are overwhelmingly Western in their theoretical frameworks. Indigenous epistemology (Wilson, 2008; Smith, 2021) and the CARE Principles (Carroll et al., 2020) point toward alternative frameworks, but sustained application to prompt design and conversational AI architectures has not been undertaken. The cross-linguistic gap in computational linguistics compounds this problem.
Gap 5: Empirical Testing of Philosophical Claims. The phenomenological and epistemological claims about prompt mode effects have not been empirically tested with human participants. If the interrogative turn produces measurably different phenomenological stances and these correlate with epistemically richer engagement, this would provide strong evidence for the thesis. If not, the thesis reduces to an interesting interpretive framework without practical consequence.
Gap 6: Cognitive Science of Prompting. Despite the relevance of dual process theory, cognitive load theory, and bounded rationality to prompt decomposition, no published research programme systematically applies cognitive science models to prompt engineering practice.
11.2 Research Questions
The following research questions are presented in original form, followed by sharpened formulations derived from the peer review process.
RQ1: How does the illocutionary force of prompts (imperative vs. interrogative vs. mixed) measurably affect the epistemological character of LLM outputs?
Sharpened formulation (RQ-A): Under what task conditions does prompt illocutionary force predict task performance? Given Leidner and Plachouras's (2023) finding that linguistic form does not reliably predict output quality, for which task types, model architectures, and complexity levels does the choice between imperative and interrogative framing produce statistically significant differences in (a) task accuracy, (b) output diversity, (c) explanation quality, and (d) error type distribution? This formulation explicitly tests the boundary conditions of the interrogative turn thesis rather than assuming its validity.
Disciplines bridged: Computational linguistics (speech act theory, pragmatics) Ă Philosophy of AI (epistemology) Ă Technical AI (evaluation).
Methods: Factorial experiment: {task type: factual, analytical, creative, multi-step} à {prompt form: imperative, interrogative, mixed} à {model: at least 3 architectures} à {complexity: low, medium, high}. N ⼠200 prompts per cell. Evaluate on standard metrics plus epistemological dimensions (hedging, limitation acknowledgement, generative capacity).
Novelty: High. Existing work evaluates accuracy; no published study evaluates epistemological quality as a function of illocutionary force.
RQ2: Can Decomposed Prompting be formally reinterpreted as compositional question semantics, and does this reinterpretation yield measurably superior decomposition strategies?
Sharpened formulation (RQ-E): Can a context-free or mildly context-sensitive grammar be induced from a corpus of successful prompt decompositions, and does compliance with this grammar predict decomposition success?
Methods: Formal mapping between DecomP's modular architecture and partition semantics. Implementation and comparative evaluation against standard decomposition on compositional reasoning benchmarks.
RQ3: What would a relational prompt decomposition engine look likeâone designed according to Indigenous relational epistemology (Wilson, 2008) and the CARE Principles (Carroll et al., 2020)?
Note: This research must be conducted with Indigenous communities through participatory design, not merely about Indigenous epistemology.
Methods: Participatory design with Indigenous knowledge holders. Architectural specification grounded in CARE Principles. Comparative analysis with existing systems on relational accountability, context preservation, and community authority.
RQ4: To what extent does Bakhtin's critique of monologism apply to multi-agent systems?
Sharpened formulation (RQ-C): Is the output diversity of N agents using the same underlying model statistically greater than N independent samples from that model? If not, multi-agent "dialogue" is computationally equivalent to repeated sampling.
Methods: Compare output distributions (semantic similarity, strategy diversity, error diversity) across independent sampling, same-model agents with different roles, and different-model agents.
RQ5: Can Djeffal's reflexive framework be extended to decomposition engines?
Sharpened formulation: Can decomposition engines monitor their own epistemic adequacy using automated feedback mechanisms?
Methods: Extension of TextGrad to include ethical and epistemological evaluation criteria. Integration of Djeffal's framework into DSPy-like systems.
RQ6: How do Gricean maxim violations in prompt design correlate with decomposition failures?
Sharpened formulation (RQ-B): Do decomposition plans that violate Gricean maxims (Krause & Vossen, 2024) fail at statistically higher rates than pragmatically well-formed decompositions?
Methods: Annotate 500+ decomposition traces from AgentBench/SWE-bench for maxim violations. Build predictive models.
RQ7: Does the phenomenological relation humans experience with AI change when interacting through imperative versus interrogative prompting?
Methods: Phenomenological interview study. Think-aloud protocols. Evaluation of epistemic quality as a function of interaction mode.
RQ8: Can Socratic structures encoded in PromptBreeder's mutation operators improve optimisation outcomes?
Methods: Implementation of Socratic mutation operators (elenchus, aporia, maieutics). Comparison with standard PromptBreeder on benchmarks.
RQ9: What would a formal discourse grammar of prompt decomposition look like?
Methods: Corpus annotation using extended RST + question-semantic categories. Grammar induction. Validation through correlation with task performance.
RQ10: How does the temporal phenomenology of human-AI dialogue affect epistemic quality?
Methods: Experimental comparison of standard vs. paced AI responses. Phenomenological interviews.
RQ-F (Failure Conditions): Under what task, model, and user conditions does imperative decomposition outperform interrogative decomposition?
This question is essential for bounding the interrogative turn thesis. If the thesis is correct, there should be identifiable task conditions where interrogative framing yields measurably better outcomesâand equally identifiable conditions where it does not. Without clear boundary conditions, the thesis is unfalsifiable. Preliminary evidence suggests that highly structured tasks with well-defined outputs (code generation, mathematical proofs, data transformation) may favour imperative decomposition, while open-ended tasks (research, creative writing, analysis) may favour interrogative approaches. But this hypothesis requires systematic testing.
Methods: Systematic comparison across task taxonomies (factual recall, analytical reasoning, creative generation, multi-step problem solving), model architectures (dense transformers, mixture-of-experts, retrieval-augmented), and user expertise levels. Identification of moderating variables through regression analysis. Pre-registration of hypotheses about which conditions favour each mode.
RQ-G (User Diversity): How do different usersâexperts vs. novices, different cultural backgrounds, different languagesâexperience and benefit from conversational decomposition?
The entire prompt engineering literature assumes an English-speaking, technically literate user. How decomposition strategies interact with user expertise, cultural communication norms, and typologically diverse languages is virtually unstudied. Languages with different question formation strategies (e.g., languages using particles rather than word-order inversion for questions), different politeness systems (e.g., Japanese honorifics), and different discourse organisation principles may interact with prompt strategies in ways that current English-centric research cannot predict.
Methods: Cross-cultural user studies with participants varying in technical expertise, cultural background, and primary language. Mixed-methods design combining performance metrics with qualitative interviews. At minimum, studies should include languages from different typological families.
Sharpened formulation (RQ-D): What are the comparative cognitive load profiles (intrinsic, extraneous, germane) of conversational vs. structured decomposition for human users, and how do they interact with expertise, task complexity, and time pressure?
Methods: Within-subjects experiment with eye tracking, NASA-TLX, and think-aloud protocols. Users perform identical tasks through (a) structured interface (form-based), (b) conversational interface (dialogue-based), (c) hybrid. Measure completion time, error rate, cognitive load, and user satisfaction.
12. Research Landscape
The research landscape for prompt decomposition spans at least six domains that rarely communicate: (1) technical AI/ML, advancing through benchmark-driven engineering; (2) computational linguistics, providing formal frameworks for analysing communicative structure; (3) cognitive science, modelling the human reasoning processes that decomposition exploits and supports; (4) philosophy of AI/technology, interrogating the epistemological and ethical dimensions; (5) Indigenous studies, challenging foundational assumptions about knowledge, relation, and extraction; and (6) human-computer interaction, studying how people actually use these systems.
Current institutional concentration is notable. The technical literature is dominated by a small number of research labs (Google DeepMind, Meta AI, Microsoft Research, Stanford NLP, Princeton NLP, Databricks) with substantial computational resources. The philosophical literature is more distributed but thinâGonzĂĄlez Arocha (Ecuador), Djeffal (Germany), Gordon (independent), Gubelmann (Switzerland), and Coeckelbergh and Gunkel (Belgium/USA) represent nearly the entirety of published philosophical engagement with prompting specifically. The Indigenous AI space includes FLAIR (Mila, Canada), Abundant Intelligences (Concordia, Canada), and a growing network of scholars connected through the IP//AI initiative. The cognitive science perspective on prompt engineering is virtually absent as an organised research programme, despite the relevance of established frameworks.
The most productive research directions lie at the intersections: formal question semantics applied to decomposition architecture (RQ2), Gricean pragmatics predicting decomposition failures (RQ6), cognitive load analysis of interaction modes (RQ-D/RQ-G), and empirical phenomenological testing (RQ7). The integrative framework spanning all six domainsâthe field's most urgent theoretical needâremains unbuilt. Interdisciplinary venues such as AI & Society, Minds and Machines, Philosophy & Technology, FAccT, and the ACL "Bridging HCI and NLP" workshop are the most natural publication targets for the kind of cross-cutting research this survey identifies as needed.
13. Conclusion
This survey has examined the evolution of prompt decomposition from instruction to inquiry across technical, linguistic, cognitive, philosophical, and relational perspectives. The interrogative turn is an interpretive framework supported by converging evidence across multiple levels of analysis, though its empirical basis remains thin. The strongest technical systems remain structured and imperative; the "conversational" evidence is recent, largely unreplicated, and may reflect observer bias rather than a robust phenomenon.
What the evidence does support is more modest but still significant. First, the linguistic form of prompts has measurable computational consequences (Zhang & Cao, 2025; Ivison et al., 2024), even if the relationship between form and output quality is complex and task-dependent (Leidner & Plachouras, 2023). Second, the shift from commands to questions is a real linguistic shift with formal semantic consequencesâinterrogative prompts create different answer sets and different compositional structures than imperative ones (Hamblin, 1973; Press et al., 2023). Third, cognitive science provides principled explanations for why decomposition worksâit enforces deliberative processing, manages cognitive load, and accommodates bounded rationality (Kahneman, 2011; Sweller, 1988; Simon, 1956)âthough these explanations apply to decomposition generally, not to the interrogative mode specifically. Fourth, philosophical analysis reveals that the choice between command and question modes is not epistemically neutralâit shapes human engagement, interpretive authority, and epistemic agency (GonzĂĄlez Arocha, 2025; Floridi, 2025), even when the machine's "understanding" remains that of a stochastic parrot (Bender et al., 2021). Fifth, Indigenous epistemologies pose challenges that go beyond the conversational turn to question the extractive foundations of AI knowledge production itself (Wilson, 2008; Smith, 2021; Kovach, 2021; Little Bear, 2000).
GonzĂĄlez Arocha's claim that prompting is "an inherently philosophical act" is one philosopher's provocative thesis, not established consensus. But the convergence of evidence across five domains suggests something in its vicinity: every prompt design choiceâimperative or interrogative, monolithic or decomposed, extractive or relationalâcarries assumptions about knowledge, agency, and the relationship between human and machine. Whether these choices are best understood as philosophical acts or pragmatic engineering decisions is itself one of the questions this survey leaves open.
The most productive next steps are empirical: testing whether illocutionary force predicts output quality beyond accuracy (RQ-A), whether pragmatic well-formedness predicts decomposition success (RQ-B), whether multi-agent "dialogue" produces genuine diversity beyond single-model sampling (RQ-C), what the cognitive load profiles of different interaction modes are (RQ-D), and under what conditions imperative decomposition outperforms interrogative decomposition (RQ-F). Until such evidence is available, the interrogative turn remains a compelling analytical frameworkâtheoretically motivated, linguistically grounded, philosophically significant, but not yet empirically validated.
The deeper contribution of this survey may be methodological: demonstrating that prompt engineering cannot be adequately understood from within any single disciplinary perspective. The technical literature sees performance but misses meaning. The linguistics literature sees structure but misses phenomenology. The philosophy literature sees significance but misses engineering constraints. The cognitive science literature sees processing but misses social and ethical dimensions. Indigenous epistemology sees relation but challenges the entire framework within which the other disciplines operate. An adequate understanding of what happens when a human types a prompt into a language model requires all of these perspectivesâand the willingness to let them challenge one another rather than simply juxtaposing them.
14. Consolidated Bibliography
Aguas, J. (2025). Buber's I-Thou philosophy and its implications for human-AI relations. Sophia, 38. [Peer-reviewed]
Anthropic. (2025). Effective context engineering for AI agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents [Blog post]
Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge University Press.
Austin, J. L. (1962). How to do things with words. Oxford University Press.
Bakhtin, M. M. (1929/1963). Problems of Dostoevsky's poetics (C. Emerson, Trans.). University of Minnesota Press, 1984.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? đŚ Proceedings of FAccT 2021, 610â623. https://doi.org/10.1145/3442188.3445922
Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of ACL 2020, 5185â5198.
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajber, J., Lehmann, T., Grundmann, M., Nyczyk, H., Schick, R., & Hoefler, T. (2024). Graph of Thoughts: Solving elaborate problems with large language models. Proceedings of AAAI 2024. https://arxiv.org/abs/2308.09687
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Proceedings of NeurIPS 2020. https://arxiv.org/abs/2005.14165
Buber, M. (1923). I and Thou (W. Kaufmann, Trans.). Scribner, 1970.
Carroll, S. R., Garba, I., Figueroa-RodrĂguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043
Chang, E. Y., et al. (2023). Prompting large language models with the Socratic method. IEEE Access, 11, 51156â51167. https://doi.org/10.1109/ACCESS.2023.3267890
Christiano, P., Leike, J., Brown, T. B., Marber, M., Amodei, D., & Irving, G. (2017). Deep reinforcement learning from human preferences. Proceedings of NeurIPS 2017. https://arxiv.org/abs/1706.03741
Coeckelbergh, M. (2012). Growing moral relations: Critique of moral status ascription. Palgrave Macmillan.
Coeckelbergh, M., & Gunkel, D. J. (2025). Communicative AI: A critical introduction to large language models. Polity.
Dennett, D. C. (1987). The intentional stance. MIT Press.
Djeffal, C. (2025). Reflexive prompt engineering: A framework for responsible prompt engineering and AI interaction design. Proceedings of FAccT 2025. https://arxiv.org/abs/2504.16204
Dreyfus, H. L. (1972). What computers can't do: The limits of artificial intelligence. MIT Press.
Dreyfus, H. L. (1992). What computers still can't do: A critique of artificial reason. MIT Press.
Dreyfus, H. L. (2007). Why Heideggerian AI failed and how fixing it would require making it more Heideggerian. Philosophical Psychology, 20(2), 247â268.
Fernando, C., Banarse, D., Michalewski, H., Osindero, S., & Rocktäschel, T. (2024). Promptbreeder: Self-referential self-improvement via prompt evolution. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.16797
Ferrario, A., & Loi, M. (2026). Are large language models intentional? The limits of referential grounding. Philosophy & Technology, 39. https://doi.org/10.1007/s13347-026-01079-4
Floridi, L. (2023). The ethics of artificial intelligence: Principles, challenges, and opportunities. Oxford University Press.
Floridi, L. (2025). AI as agency without intelligence: On artificial intelligence as a new form of agency. Philosophy & Technology, 38. https://doi.org/10.1007/s13347-025-00858-9
Gadamer, H.-G. (1960). Truth and method (J. Weinsheimer & D. G. Marshall, Trans.). Continuum, 2004.
GonzĂĄlez Arocha, J. (2025). Critical phenomenology of prompting in artificial intelligence. Sophia, 39. https://doi.org/10.17163/soph.n39.2025.04
Gordon, J. (2024). Speech acts and large language models. PhilArchive. https://philarchive.org/archive/GORSAA-12v1
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41â58). Academic Press.
Groenendijk, J., & Stokhof, M. (1984). Studies on the semantics of questions and the pragmatics of answers (Doctoral dissertation). University of Amsterdam.
Gubelmann, R. (2024). Large language models, agency, and why speech acts are beyond them (for now). Philosophy & Technology, 37, 45. https://doi.org/10.1007/s13347-024-00696-1
Guo, Q., Wang, R., Guo, J., Li, B., Song, K., Tan, X., Liu, G., Bian, J., & Yang, Y. (2024). EvoPrompt: Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. Proceedings of ICLR 2024. https://arxiv.org/abs/2309.08532
Hamblin, C. L. (1973). Questions in Montague English. Foundations of Language, 10(1), 41â53.
Hasse, C. (2017). Rethinking the I-You relation through dialogical philosophy in the age of social robots. AI & Society, 32, 467â479. https://doi.org/10.1007/s00146-017-0703-x
Hobbs, J. R. (1979). Coherence and coreference. Cognitive Science, 3(1), 67â90.
Hong, S., Zhuge, M., Chen, J., et al. (2024). MetaGPT: Meta programming for a multi-agent collaborative framework. Proceedings of ICLR 2024. https://arxiv.org/abs/2308.00352
Hu, J., et al. (2025). Pragmatics in the era of large language models. https://arxiv.org/abs/2502.12378
IBM. (2026). What is context engineering? IBM Think. [Blog post / technical guide]
Ihde, D. (1990). Technology and the lifeworld: From garden to earth. Indiana University Press.
Ivison, H., et al. (2024). From language modeling to instruction following: Understanding the behavior shift in LLMs after instruction tuning. Proceedings of NAACL 2024. https://aclanthology.org/2024.naacl-long.130/
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Mober, H., et al. (2024a). DSPy: Compiling declarative language model calls into self-improving pipelines. Proceedings of ICLR 2024 (Spotlight). https://arxiv.org/abs/2310.03714
Khattab, O., et al. (2024b). Fine-tuning and prompt optimization: Two great steps that work better together. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.597.pdf
Khot, T., Trivedi, H., Finlayson, M., et al. (2023). Decomposed prompting: A modular approach for solving complex tasks. Proceedings of ICLR 2023. https://arxiv.org/abs/2210.02406
Kovach, M. (2021). Indigenous methodologies: Characteristics, conversations, and contexts (2nd ed.). University of Toronto Press. (Original work published 2009)
Krause, L., & Vossen, P. (2024). The Gricean Maxims in NLPâA survey. Proceedings of INLG 2024. https://aclanthology.org/2024.inlg-main.39/
Leidner, J. L., & Plachouras, V. (2023). The language of prompting: What linguistic properties make a prompt successful? Findings of EMNLP 2023. https://arxiv.org/abs/2311.01967
Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Proceedings of EMNLP 2021.
Levinas, E. (1961). Totality and infinity: An essay on exteriority (A. Lingis, Trans.). Duquesne University Press, 1969.
Lewis, J. E., Arista, N., Pechawis, A., & Kite, S. (2020). Making kin with the machines. Journal of Design and Science, 6. https://doi.org/10.21428/7808da6b.7484b0e0
Li, G., Hammoud, H., Itani, H., et al. (2023). CAMEL: Communicative agents for "mind" exploration of large language model society. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2303.17760
Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of ACL 2021.
Little Bear, L. (2000). Jagged worldviews colliding. In M. Battiste (Ed.), Reclaiming Indigenous voice and vision (pp. 77â85). UBC Press.
Ma, Y., et al. (2024). The death and life of great prompts: Analyzing the evolution of LLM prompts from the structural perspective. Proceedings of EMNLP 2024. https://aclanthology.org/2024.emnlp-main.1227/
Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517â540. https://doi.org/10.1016/j.tics.2024.01.011
Mann, W. C., & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3), 243â281.
Markl, N. (2025). Taxonomizing representational harms using speech act theory. https://arxiv.org/abs/2504.00928
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Proceedings of NeurIPS 2022. https://arxiv.org/abs/2203.02155
Press, O., et al. (2023). Measuring and narrowing the compositionality gap in language models. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.378/
Qian, C., Cong, X., Yang, C., et al. (2024). ChatDev: Communicative agents for software development. Proceedings of ACL 2024.
Roberts, C. (2012). Information structure in discourse: Towards an integrated formal theory of pragmatics. Semantics and Pragmatics, 5(6), 1â69.
Robino, G. (2025). Conversation Routines: A prompt engineering framework for task-oriented dialog systems. arXiv:2501.11613. https://arxiv.org/abs/2501.11613
Russo, F., Schliesser, E., & Wagemans, J. (2023). Connecting ethics and epistemology of AI. AI & Society, 38. https://doi.org/10.1007/s00146-022-01617-6
SciELO. (2025). Bakhtin and machine-generated polyphony. SciELO en Perspectiva. [Blog post]
Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417â424.
Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2), 68â79. https://doi.org/10.1145/3624724
Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. Proceedings of EMNLP 2020, 4222â4235. https://arxiv.org/abs/2010.15980
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129â138.
Simon, H. A. (1996). The sciences of the artificial (3rd ed.). MIT Press.
Smith, L. T. (2021). Decolonizing methodologies: Research and Indigenous peoples (3rd ed.). Zed Books. (Original work published 1999)
Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (2nd ed.). Blackwell. (Original work published 1986)
Springer, A. (2026). AI and epistemic justice: Decolonial perspectives. AI & Society. [Peer-reviewed]
STRV. (2024). AI and Wittgenstein's language games. STRV Blog. [Blog post]
Suchman, L. (2007). Human-machine reconfigurations: Plans and situated actions (2nd ed.). Cambridge University Press.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257â285.
Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer.
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023a). Plan-and-Solve prompting: Improving zero-shot chain-of-thought reasoning. Proceedings of ACL 2023. https://arxiv.org/abs/2305.04091
Wang, X., Li, C., Wang, Z., Bai, F., Luo, H., Zhang, J., Jojic, N., Xing, E. P., & Hu, Z. (2024). PromptAgent: Strategic planning with language models enables expert-level prompt optimization. Proceedings of ICLR 2024. https://arxiv.org/abs/2310.16427
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Proceedings of NeurIPS 2022. https://arxiv.org/abs/2201.11903
Wilson, S. (2008). Research is ceremony: Indigenous research methods. Fernwood Publishing.
Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell.
Wu, Q., Bansal, G., Zhang, J., et al. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. https://arxiv.org/abs/2308.08155
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2024). Large language models as optimizers (OPRO). Proceedings of ICLR 2024. https://arxiv.org/abs/2309.03409
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate problem solving with large language models. Proceedings of NeurIPS 2023. https://arxiv.org/abs/2305.10601
Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Huang, Z., Guestrin, C., & Zou, J. (2024/2025). TextGrad: Automatic "differentiation" via text. arXiv 2024; Nature 2025. https://arxiv.org/abs/2406.07496
Zeldes, A., et al. (2025). eRST: A signaled graph theory of discourse relations and organization. Computational Linguistics, 51(1), 23â72. https://doi.org/10.1162/coli_a_00538
Zhang, J., & Cao, Y. (2025). Why prompt design matters and works: A complexity analysis of prompt search space in LLMs. Proceedings of ACL 2025. https://aclanthology.org/2025.acl-long.1562/
Zhou, A., Yan, K., Shlapentokh-Rothman, M., et al. (2024). Language Agent Tree Search unifies reasoning, acting, and planning in language models. Proceedings of ICML 2024. https://arxiv.org/abs/2310.04406
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2023a). Least-to-Most prompting enables complex reasoning in large language models. Proceedings of ICLR 2023. https://arxiv.org/abs/2205.10625
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023b). Large language models are human-level prompt engineers (APE). Proceedings of ICLR 2023. https://arxiv.org/abs/2211.01910
Grey literature and unverifiable references:
- DTPA (Deep Thinking Prompting Agents): [Grey literature; no formal publication identified. Referenced in technical discussions but could not be verified against peer-reviewed or preprint sources.]
- GEPA (arXiv:2507.19457): [Preprint with extraordinary performance claims (open-source models outperforming frontier models at 90Ă lower cost). Independent verification required before relying on these results.]
This survey was produced as part of the IAIP Polyphonic Discussion research protocol (RCH-CTX-Polyphonic-discussion--2604060040). It is a revised version (v2) incorporating extensive revisions in response to peer review (composite score 5.6/10 on v1). Major revisions include: reframed central thesis, citation audit, strengthened Indigenous engagement, new counter-evidence section, cognitive science perspective, explicit engagement with Bender et al. (2021), search methodology, worked linguistic examples, and additional literature. April 2026.