Deep-Thinking Ratio and Its Implications for Developing Indigenous Technologies: An Academic Survey and Literature Review

Part I: The Deep-Thinking Ratio — Technical Foundations

1.1 Current Reality Statement: The Failure of "Longer Is Better"

Chain-of-Thought (CoT) reasoning has been the dominant paradigm for scaling LLM inference-time performance. The prevailing assumption holds that more reasoning tokens yield better answers. However, a February 2026 paper by Chen et al. from the University of Virginia and Google — "Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens" — demonstrates that this assumption is fundamentally flawed. Across eight model variants and four reasoning benchmarks, raw token count exhibits an average Pearson correlation of r = −0.59 with accuracy. Longer chains of thought frequently signal overthinking, where models amplify flawed heuristics, repeat redundant steps, or fixate on irrelevant details rather than engaging in productive reasoning.[^1][^2][^3]

This finding challenges the entire infrastructure of test-time compute scaling. If verbosity is not a proxy for reasoning quality, the field requires a mechanistically grounded metric for measuring genuine cognitive effort in language models.

1.2 Deep-Thinking Tokens: Mechanism and Formalization

The core contribution of the DTR paper is the identification of deep-thinking tokens — tokens whose internal probability distributions undergo significant revision across the deeper transformer layers before converging. The method operates by projecting intermediate hidden states into the vocabulary space using the model's unembedding matrix, producing a probability distribution at each layer. Jensen-Shannon Divergence (JSD) between each intermediate layer's distribution and the final layer's distribution quantifies how much revision occurs.[^2][^3]

A token's settling depth is defined as the first layer at which its cumulative minimum divergence falls below a threshold g. A token qualifies as a deep-thinking token if it settles only within the late regime, defined by a depth fraction ρ. With default hyperparameters g = 0.5 and ρ = 0.85, this means the token's prediction only stabilized in the final 15% of layers.[^1][^2]

The Deep-Thinking Ratio (DTR) for a sequence is the proportion of tokens that settle in this late regime. Across the benchmarks AIME 2024/2025, HMMT 2025, and GPQA-Diamond, and models including GPT-OSS (20B and 120B variants), DeepSeek-R1-70B, and Qwen3-30B-Thinking, DTR achieves an average positive correlation with accuracy of r = 0.683, with positive correlations in 30 out of 32 model-benchmark settings.[^3][^2]

1.3 Think@n: Efficient Test-Time Scaling via Early Halting

Leveraging DTR, the authors introduce Think@n, a test-time scaling strategy that replaces the standard self-consistency approach (Cons@n) of generating all n candidate answers in full and majority-voting. Think@n operates by:[^2]

Sampling n = 48 candidate responses per problem
Computing DTR from only the first 50 prefix tokens of each candidate
Immediately halting "unpromising" candidates with low DTR
Completing and majority-voting only the top-η% candidates (η = 50%)

Method	AIME 2025 Accuracy	Cost (k tokens)	Cost Reduction
Cons@n (Majority Vote)	92.7%	307.6	—
Think@n (DTR Selection)	94.7%	155.4	−49%
Self-Certainty@n	87.3%	150.6	−51%
Short@n	87.3%	255.7	−17%

Results on OSS-120B-medium.[^1][^2]

Think@n achieves higher accuracy than standard voting while reducing inference cost by approximately 50%. This holds across all four benchmarks and on both OSS-120B-medium and Qwen3-4B-Thinking models. A prefix of just 50 tokens is sufficient to estimate DTR; longer prefixes do not improve selection quality.[^2]

1.4 Contextual Research Landscape: Inference Efficiency

DTR sits within a broader movement toward adaptive computation. Token-Budget-Aware LLM Reasoning (TALE) dynamically adjusts reasoning token budgets based on problem complexity, reducing cost with minimal accuracy loss. SelfBudgeter trains models to pre-estimate required reasoning budgets, achieving 61% average response length compression on math tasks. FrugalGPT demonstrated LLM cascades that match GPT-4 performance with up to 98% cost reduction. Collectively, this body of work establishes that intelligent allocation of compute — directing resources toward genuinely difficult reasoning and away from trivial or counterproductive token generation — is a more principled path than uniform scaling.[^4][^5][^6][^7][^8]

Part II: Indigenous Knowledge Systems, AI, and Technology Development — Literature Review

2.1 Framing: The Intersection of IKS and AI

A 2025 systematic review by Perera et al. in Big Data & Society provides the most current mapping of the intersection between Indigenous Knowledge Systems (IKS) and artificial intelligence. The review identifies four overlapping categories in existing literature: (1) AI assisting the promotion and preservation of IKS, (2) AI supporting Indigenous community needs, (3) risks arising from AI for Indigenous peoples — including erosion of cultural knowledge and data-grabbing — and (4) how IKS can enrich the development of AI itself. This final category is the most underexplored and the most relevant to the DTR paradigm.[^9][^10]

A parallel 2025 systematic review by Neira-Dàvila, Aguirre-Rivera, and Marín Yusti examines generative AI and IKS in Latin America through the lens of data colonialism and epistemic justice. Their PRISMA-protocol analysis warns that when GenAI technologies are deployed over IKS, they "intervene in broader disputes over data coloniality, epistemic sovereignty, and the risks of digital epistemicide".[^11]

2.2 Foundational Governance Frameworks

2.2.1 CARE Principles for Indigenous Data Governance

The CARE Principles — Collective Benefit, Authority to Control, Responsibility, and Ethics — were developed by the International Indigenous Data Sovereignty Interest Group within the Research Data Alliance. They are explicitly people- and purpose-oriented, designed to complement the FAIR Principles (Findable, Accessible, Interoperable, Reusable) with collective Indigenous rights and interests. The goal is that data stewards and users will "Be FAIR and CARE". The CARE Principles build on earlier work by Te Mana Raraunga Māori Data Sovereignty Network, the US Indigenous Data Sovereignty Network, and the Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective.[^12][^13]

A critical tension exists between CARE's emphasis on collective benefit and authority to control, and the open data / machine learning paradigm's appetite for unencumbered data access. Operationalizing FAIR alongside CARE requires institutional mechanisms that currently do not exist at scale.[^14][^12]

2.2.2 OCAP® Principles

The First Nations principles of Ownership, Control, Access, and Possession (OCAP®), established in 1998 by Canadian First Nations leadership and trademarked by the First Nations Information Governance Centre (FNIGC), provide a data governance standard for how First Nations data should be collected, protected, used, and shared. Ownership asserts that communities own information collectively; Control affirms rights over all research and information management processes; Access requires that First Nations have access to data about themselves regardless of where it is held; and Possession refers to physical control of data as the mechanism by which ownership is asserted.[^15][^16][^17]

2.2.3 Māori Algorithmic Sovereignty

Extending data sovereignty to algorithmic contexts, research on Māori algorithmic sovereignty re-works sub-principles of Māori data sovereignty to address responsible algorithm design. This framework recognizes algorithms as a particular use of data, and therefore data governance frameworks can be extended to include them. The framework establishes that Māori communities retain authority over algorithms trained on, applied to, or affecting Māori data and people.[^18][^19]

2.3 Indigenous Protocol and Artificial Intelligence

The Indigenous Protocol and Artificial Intelligence Position Paper (Lewis et al., 2020) is the foundational document articulating a multiplicity of IKS in conversation with AI practices. Produced through workshops with Indigenous researchers from Aotearoa, Australia, North America, and the Pacific, the paper refuses to present a unified "Indigenous perspective" and instead offers a heterogeneous collection of design guidelines, essays, artworks, and technology prototypes.[^20][^21]

Five core themes emerged from the workshops:[^21]

Hardware and Software Sovereignty: Asserting control over AI systems to ensure they support community responsibilities
How to Build Anything Ethically: Using kinship protocols (e.g., Lakota sweat lodge building protocols applied to hardware construction) as models for ethical AI design
Language, Landscape, and Culture: Ensuring territory-specific understanding is foundational to AI systems
Art Practice as Value Practice: Affirming art's role in producing and sharing knowledge, enabling communities to envision AI futures
AI as Skabe (Helper): Positioning AI in reciprocal relationships of care and support, rejecting both the slave and tyrant models

The Guidelines for Indigenous-Centred AI Design v.1 specify seven principles: locality, relationality and reciprocity, responsibility and accountability, governance from Indigenous protocols, recognition of computation's cultural nature, ethical design across the extended stack, and respect for data sovereignty.[^22][^21]

2.4 The Abundant Intelligences Research Program

The Abundant Intelligences program, affiliated with the Indigenous AI initiative, proposes reconceptualizing and designing AI based on IKS. Grounded in Indigenous epistemologies, the program aims to develop "culturally-grounded AI systems that support Indigenous ways of knowing and that recognize the abundant multiplicity of ways of being intelligent in the world". The framework is explicitly optimized for abundance rather than scarcity — a philosophical orientation that contrasts sharply with the efficiency-maximization framing dominant in mainstream AI research.[^23]

A related 2024 paper in the context of Indigenous healthcare proposes a "Two-Eyed AI" framework, emphasizing co-creation with Indigenous communities and multidisciplinary development teams.[^24][^25]

2.5 Language Revitalization and NLP for Indigenous Languages

2.5.1 Te Hiku Media and Te Reo Māori

Te Hiku Media, a Māori-led broadcaster, developed automatic speech recognition (ASR) models for te reo Māori using NVIDIA NeMo and A100 GPUs, achieving 92% accuracy for monolingual transcription and 82% for bilingual speech. The data was collected through the Kōrero Māori crowdsourcing campaign under the Kaitiakitanga license, which ensures data is used only for the benefit of the Māori people. CTO Keoni Mahelona has stated that "data is the last frontier of colonization," framing this work as an act of technological sovereignty.[^26][^27][^28][^29]

Te Hiku Media's approach has inspired ASR projects by Native Hawaiians and the Mohawk people in Canada. The model demonstrates that Indigenous-led AI development can produce state-of-the-art tools while maintaining strict data sovereignty protocols.[^28]

2.5.2 Small Language Models for Indigenous Languages

A 2025 Brookings Institution analysis argues that small language models (SLMs) offer a cost-effective and resource-efficient solution for Indigenous communities by reducing computational and data requirements. SLMs can be trained on smaller, language-specific datasets, facilitating tools like spellcheckers, word predictors, machine translation, and digital documentation platforms. Their real-time processing capabilities and efficient deployment on affordable hardware make them suitable for low- and middle-income countries where bandwidth and computational resources are constrained.[^30]

Specific examples include:

Model	Language(s)	Parameters	Key Achievement
LakotaBERT	Lakota	Transformer-based	51% language modeling accuracy, comparable to English baselines[^31]
IndT5	10 Indigenous languages + Spanish	T5-based	First transformer model for Indigenous languages, machine translation[^32]
InkubaLM	Low-resource African languages	0.4B	Comparable to much larger models on translation and QA[^33]
Adi Vaani	Santali, Mundari, Bhili, Gondi	Multiple tools	Text-to-speech, translation, OCR for tribal languages[^34]

The Stanford HAI report "Mind the (Language) Gap" confirms that LLM development suffers from a fundamental digital divide: most major LLMs underperform for non-English and especially low-resource languages, are not attuned to relevant cultural contexts, and are not accessible in parts of the Global South.[^35]

2.5.3 AI for Endangered Languages in Brazil

Pinhanez et al. (2024) describe an alternative AI development cycle based on community engagement for Indigenous languages in Brazil, including Guarani Mbya, Nheengatu, and other endangered languages. Fine-tuning state-of-the-art translation models on small datasets produces promising results, with prototypes co-developed with Indigenous communities.[^36][^37]

2.6 Institutional and Policy Landscape

The United Nations declared the theme for the 2025 International Day of the World's Indigenous Peoples as "Indigenous Peoples and AI: Defending Rights, Shaping Futures". The UN Permanent Forum on Indigenous Issues at its 24th session in 2025 recommended meaningful inclusion of Indigenous Peoples in AI development, governance, and application. A 2024 UN General Assembly resolution emphasized that human rights must be respected throughout the life cycle of AI systems.[^38][^39]

UNESCO launched a report on "Indigenous People-Centered Artificial Intelligence: Perspectives from Latin America and the Caribbean," urging participatory inclusion and proposing public policies to integrate Indigenous perspectives in all phases of AI development. The report reaches a region where more than 10% of the world's Indigenous population resides, where nearly 30% live in extreme poverty, and only 40% have basic computer skills.[^40]

Arizona State University's Center for Tribal Digital Sovereignty positions tribes not as passive participants but as sovereign nations with inherent authority to determine how AI technologies fit within their governance systems. The Cherokee Nation has adopted tribal AI policies creating governance committees and cultural protections.[^41]

Canada's CIFAR requires researchers and AI professionals to complete training in Indigenous perspectives. A 2025 TELUS/IEEE paper found that AI development must reflect the distinct cultures, governance structures, and worldviews of First Nations, Inuit, and Métis peoples, and that 35% of Indigenous respondents reported concerns about AI's impact.[^42][^43]

2.7 Community-Engaged AI/ML Frameworks

The Alaska Tribal Health System has developed a community-engaged framework for AI/ML that integrates community-based participatory research methods with machine learning. The framework allows emerging AI/ML technologies to align with Alaska Native communities' unique worldviews, community strengths, and healthcare goals, navigating the legacy of historical research abuses.[^44]

In Australia, an action co-research project in Kakadu National Park developed knowledge coproduction mechanisms to weave Indigenous knowledge, AI, and technical data sources for monitoring culturally significant wetlands. This demonstrates practical integration of IKS with AI-driven environmental monitoring under Indigenous governance.[^45]

2.8 Risks: Algorithmic Colonialism and Data Extraction

The concept of algorithmic colonialism describes the imposition of Western-centric ideological frameworks on global knowledge systems through AI. Indigenous communities' oral traditions remain largely unrepresented in digital archives, meaning their histories and epistemologies rarely appear in AI training datasets. Many Indigenous languages remain low-resource in AI development, and language models struggle to support even basic text processing for these communities.[^46]

Carroll et al. (2023) frame data mining as a colonial practice, stressing the need for Indigenous Data Sovereignty and the inclusion of Indigenous rights in data-reliant technology. The Common Rule governing human subjects research requires revision to address AI/ML applications with Indigenous Peoples, particularly regarding collective harms.[^47][^48]

The creative data justice framework proposed by scholars in the Global South calls for reimagining creative value with AI by broadening who and what counts as creative in data-driven systems, drawing on Indigenous systems of care as a counterforce to neoliberal efficiency values.[^49]

Part III: Synthesis — What DTR Implies for Developing Indigenous Technologies

3.1 The Resource-Efficiency Thesis

The DTR paper's central finding — that inference quality can be maintained or improved while cutting compute costs by approximately 50% — has direct structural implications for Indigenous technology development. Indigenous communities and institutions face persistent resource constraints: limited computational infrastructure, constrained budgets, sparse digital data, and geographic remoteness. Every reduction in the compute required per inference cycle narrows the gap between what well-funded labs can deploy and what community-level organizations can sustain.[^35][^30][^1][^2]

The Think@n strategy specifically demonstrates that a short 50-token prefix is sufficient to determine whether a reasoning chain is productive. This has immediate engineering implications for on-device and edge deployments. If DTR-guided early halting is implemented in SLMs — models already identified as the most promising architecture for Indigenous language and knowledge applications — the energy, latency, and hardware requirements drop further. Combined with existing research on deploying LLMs to resource-constrained edge devices, DTR-based inference could enable capable reasoning systems on affordable hardware accessible to remote communities.[^50][^51][^30][^2]

3.2 Reframing "Depth" Through Indigenous Epistemology

The DTR metric's distinction between shallow tokens (predictions that stabilize early in the transformer stack) and deep-thinking tokens (predictions requiring sustained revision through deeper layers) resonates with Indigenous epistemological frameworks in non-trivial ways.

The Indigenous Protocol and AI Position Paper articulates that Indigenous knowledge systems value depth of understanding over breadth of coverage — knowledge embedded in relationship, territory, and intergenerational experience. The Abundant Intelligences program's optimization for abundance rather than scarcity explicitly rejects the "more tokens = more intelligence" paradigm that DTR empirically debunks. Where mainstream AI research discovered through mechanistic analysis that shallow verbosity degrades performance, Indigenous epistemologies have long maintained that genuine understanding requires deep relational engagement rather than expansive surface production.[^23][^21]

This convergence is not mere metaphor. DTR operationalizes a distinction between performative output (generating many tokens) and genuine computational engagement (sustained internal revision). Indigenous design principles similarly distinguish between extractive data accumulation and meaningful knowledge production that serves community well-being.[^12][^22]

3.3 Data Sovereignty and Compute Sovereignty

The CARE and OCAP® frameworks establish that Indigenous communities must control the collection, analysis, and use of their data. DTR extends the sovereignty question from data to compute. If 50% of inference compute is being wasted on unproductive reasoning chains, communities paying for API access or operating local infrastructure are spending resources on tokens that not only fail to help but actively degrade quality.[^15][^12][^1]

DTR-guided inference becomes a tool of compute sovereignty: the ability to allocate computational resources according to community-defined priorities rather than accepting the default assumption that more compute equals better outcomes. Te Hiku Media's model — building bespoke tools under the Kaitiakitanga license using targeted, efficient approaches — already embodies this principle. DTR provides a technical mechanism for extending it to reasoning-intensive LLM applications.[^29][^28]

3.4 Implications for Indigenous Language NLP

Indigenous language NLP is structurally constrained by data scarcity: most Indigenous languages have minimal digital corpora, and the data that does exist is often sensitive and governed by community protocols. Current approaches to building language tools for these contexts include fine-tuning on small datasets, training small language models, and developing community-sourced corpora under sovereignty-respecting licenses.[^33][^37][^36][^30][^28][^35]

DTR's implications for these efforts are threefold:

Efficient inference for low-resource models: If reasoning quality depends on depth of internal processing rather than volume of output, small models that think deeply on fewer tokens may outperform larger models that generate verbose but shallow reasoning. This aligns with Brookings' finding that SLMs with fine-tuned, context-specific datasets "yield models that are not only resource-efficient but also more accurate in their target domains".[^30]
Quality metrics for culturally-situated evaluation: Standard NLP benchmarks fail to capture the cultural nuance required for Indigenous language applications. DTR offers a language-agnostic metric that can signal reasoning quality without requiring large labeled evaluation datasets — a persistent bottleneck for low-resource languages.[^52][^53]
Cost reduction for community deployment: The 50% cost reduction demonstrated by Think@n directly maps to reduced API costs, lower hardware requirements, and extended battery life for edge devices — all critical factors for deployment in remote Indigenous communities.[^50][^1]

3.5 Ethical AI Design and the "Extended Stack"

The Indigenous Protocol and AI Guidelines call for applying ethical design "to the extended stack" — from mineral extraction for hardware to the cultural assumptions encoded in software. DTR contributes to this by making visible the internal mechanics of model reasoning, enabling communities to assess not just what an AI system outputs but how it arrives at its conclusions.[^22][^21]

This transparency aligns with the CARE Principle of Responsibility — those working with Indigenous data bear a responsibility to Indigenous peoples and their rights. If DTR can distinguish productive reasoning from counterproductive overthinking, it provides a mechanism for Indigenous governance bodies to evaluate whether AI systems deployed in their communities are engaging genuinely with difficult problems or merely generating confident-sounding verbosity.[^12]

3.6 Adaptive Computation as Relational Practice

The broader literature on adaptive token allocation — TALE's dynamic budget adjustment, SelfBudgeter's self-estimation of reasoning cost, and DTR's early halting — conceptualizes inference as a variable resource allocation problem. This framing resonates with Indigenous relational frameworks that emphasize reciprocity and proportional engagement.[^5][^7][^54][^21]

In Lakota building protocols described by Suzanne Kite in the IP AI Position Paper, ethical construction requires asking at each step whether the process serves the relationship being materialized. Adaptive inference can be understood analogously: rather than uniformly applying maximum compute to every query, the system allocates effort proportional to the genuine difficulty of the task. This is not just efficiency — it is a form of relational accountability, ensuring computational resources are directed toward meaningful engagement rather than wasteful production.[^21]

3.7 The Two-Eyed AI Framework and DTR

The Two-Eyed AI framework — drawing on Mi'kmaw Elder Albert Marshall's concept of Etuaptmumk (Two-Eyed Seeing), which involves learning to see with one eye from the strengths of Indigenous knowledge and with the other from the strengths of Western knowledge — offers a bridge between DTR's technical insights and Indigenous technology design.[^25][^24]

DTR is a product of Western computational analysis: it uses information-theoretic measures to probe internal model states. But its core finding — that depth of processing matters more than volume of output — validates a principle that Indigenous epistemologies articulate through different vocabularies. A Two-Eyed AI approach would integrate DTR's mechanistic insights with Indigenous evaluation criteria: Does the system's reasoning serve the community? Does it respect the relationships embedded in the data? Does it allocate its attention and resources in ways that are proportional and reciprocal?

3.8 Toward Indigenous Technological Sovereignty

The concept of Indigenous technological sovereignty extends beyond data ownership to control over the entire technological lifecycle — from conceptualization through deployment and governance. Tribal nations are asserting their place not as users or subjects of AI regulation but as innovators and rights-holders.[^54][^41]

DTR-class research provides technical capabilities that support this sovereignty agenda:

Efficient local deployment: 50% compute reduction makes on-premises inference feasible for tribal IT infrastructure[^1]
Transparent reasoning assessment: DTR metrics enable governance committees to evaluate AI system quality without relying on external benchmarks[^2]
Community-appropriate scaling: Think@n's early halting allows communities to set compute budgets aligned with their resources and values rather than accepting externally determined inference costs[^2]
Model selection guidance: The finding that smaller models can achieve productive deep thinking supports Indigenous-led development of purpose-built SLMs[^33][^30][^2]

Indigenomics AI — described as a "modern Indigenous economic design platform" grounded in principles of relational accountability and intergenerational responsibility — represents one vision of how these technical capabilities might be integrated into Indigenous-led economic systems.[^55]

3.9 Gaps and Future Research Directions

Several critical gaps emerge from this synthesis:

Empirical validation on Indigenous language models: DTR has been validated only on English-language mathematical and scientific reasoning benchmarks. Whether its correlation with accuracy holds for low-resource language tasks, culturally-situated knowledge queries, and oral tradition processing remains untested.
Community-defined metrics of reasoning quality: DTR measures internal computational effort, but communities need evaluation frameworks that also capture cultural appropriateness, relational accountability, and alignment with community values. Developing hybrid metrics that integrate DTR-style mechanistic signals with community-defined quality criteria is an open research challenge.
Hardware and infrastructure realities: While DTR reduces inference cost, Indigenous communities in rural and remote areas face constraints far more fundamental than per-token pricing — including internet connectivity, electricity reliability, and access to appropriate hardware.[^40][^35]
Governance protocols for adaptive inference: If DTR-guided systems dynamically allocate compute, governance frameworks must address who controls these allocation decisions. The CARE and OCAP® principles provide directional guidance, but specific protocols for governing adaptive inference in Indigenous contexts do not yet exist.
Risk of techno-solutionism: Efficiency gains in AI inference do not automatically translate to benefits for Indigenous communities. Without explicit grounding in community-led design, data sovereignty frameworks, and participatory governance, DTR-enabled systems risk replicating the extractive patterns that characterize mainstream AI development.[^48][^11]
Cross-pollination between IKS and mechanistic interpretability: The DTR paper's probing of internal model states connects to a broader mechanistic interpretability research program. Indigenous epistemologies that emphasize interiority, relational depth, and multilayered knowledge transmission may offer conceptual frameworks for understanding what it means for a model to "think deeply" — frameworks currently absent from the Western interpretability literature.[^23][^21]

3.10 Convergence Table: DTR Concepts and Indigenous Technology Principles

DTR / Inference Efficiency Concept	Aligned Indigenous Technology Principle	Implication for Development
Deep-thinking tokens (sustained internal revision)	Depth of relational engagement over surface production[^21]	Quality over quantity in model evaluation
Think@n early halting (reject unproductive chains)	Proportional reciprocity in resource allocation[^22]	Community-governed compute budgets
50% cost reduction at equal or better accuracy	Abundance optimization, not scarcity maximization[^23]	Feasible deployment on constrained infrastructure
DTR as language-agnostic metric	Cultural evaluation beyond Western benchmarks[^53]	Hybrid metrics integrating community criteria
Prefix-based quality estimation (50 tokens)	Data minimization aligned with CARE/OCAP®[^12][^15]	Less data needed for quality assessment
Overthinking as performance degradation	Extractive verbosity vs. meaningful engagement[^46]	Distinguishing productive from colonial AI behavior
Adaptive computation allocation	Relational accountability in technology use[^54]	AI that asks permission before consuming resources

Part IV: Annotated Bibliography — Key Sources

4.1 DTR and Inference Efficiency

Chen, W.-L., Peng, L., Tan, T., et al. (2026). "Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens." arXiv:2602.13517. The originating paper introducing DTR and Think@n. Demonstrates that internal layer-wise prediction revision is a more faithful measure of reasoning effort than token count, and that early halting based on DTR reduces inference cost by ~50% with no accuracy loss.[^3][^2]

Han, T., Wang, Z., Fang, C., et al. (2025). "Token-Budget-Aware LLM Reasoning." ACL Findings. Proposes TALE, a framework that dynamically adjusts reasoning token budgets per problem. Shows that LLM reasoning can be substantially compressed by specifying budgets, but the choice of budget is critical. Cited 232 times as of 2025.[^6][^5]

Li, Z., Dong, Q., Ma, J., et al. (2025). "SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning." Trains models to self-estimate reasoning budgets via reinforcement learning. Achieves 61% response compression on math tasks while maintaining accuracy.[^7]

Chen, L., Zaharia, M., & Zou, J. (2023). "FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance." arXiv:2305.05176. Foundational work on LLM cascades for cost reduction, demonstrating up to 98% cost savings while matching GPT-4.[^4]

4.2 Indigenous AI Governance and Data Sovereignty

Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., et al. (2020). "The CARE Principles for Indigenous Data Governance." Data Science Journal, 19(1), 43. Establishes the CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics) as complement to FAIR for Indigenous data contexts. Foundation for all subsequent Indigenous data governance work.[^12]

Carroll, S. R., et al. (2021). "Operationalizing the CARE and FAIR Principles for Indigenous Data Futures." Scientific Data. Addresses the practical mechanisms for implementing CARE alongside FAIR in machine-learning and big-data environments.[^14]

First Nations Information Governance Centre. (1998/ongoing). "The First Nations Principles of OCAP®." Establishes Ownership, Control, Access, and Possession as data governance standards for Canadian First Nations. A trademark of FNIGC.[^16][^15]

Kukutai, T. & Cormack, D. (2023). "Māori Algorithmic Sovereignty: Idea, Principles, and Use." Data Science Journal. Extends Māori data sovereignty principles to algorithmic contexts, recognizing algorithms as data uses requiring governance.[^19][^18]

4.3 Indigenous Protocol, Frameworks, and AI Design

Lewis, J. E., ed. (2020). Indigenous Protocol and Artificial Intelligence Position Paper. Honolulu: Initiative for Indigenous Futures / CIFAR. The foundational position paper articulating multiple IKS perspectives on AI. Includes Guidelines for Indigenous-Centred AI Design v.1 and prototypes.[^20][^21]

Perera, M., Vidanaarachchi, R., Chandrashekeran, S., et al. (2025). "Indigenous Peoples and Artificial Intelligence: A Systematic Review and Future Directions." Big Data & Society. Most current systematic mapping of IKS-AI intersection literature (2012–2023). Identifies four categories and critical gaps.[^9]

Neira-Dàvila, C. D., Aguirre-Rivera, J. C., & Marín Yusti, J. P. (2025). "Generative Artificial Intelligence and Indigenous Knowledge Systems in the Global South." Letters in High Energy Physics. PRISMA-protocol systematic review addressing GenAI's role in data colonialism and epistemic justice in Latin America.[^11]

"Towards Abundant Intelligences: Considerations for Indigenous Perspectives in Adopting AI Technology." (2024). Journal of the Canadian Health Libraries Association. Proposes Two-Eyed AI framework integrating Indigenous and Western perspectives for healthcare AI.[^24][^25]

4.4 Indigenous Language Technology

NVIDIA & Te Hiku Media. (2024). "Māori Speech AI Model Helps Preserve and Promote New Zealand's Indigenous Language." Documents the development of ASR for te reo Māori: 92% accuracy, crowdsourced under Kaitiakitanga license, inspiring replication by Native Hawaiian and Mohawk communities.[^28]

Brookings Institution. (2025). "Can Small Language Models Revitalize Indigenous Languages?" Argues SLMs offer the most viable path for Indigenous language technology due to reduced compute/data requirements, deployment on affordable hardware, and community-driven development.[^30]

Pinhanez, C., et al. (2024). "Harnessing the Power of AI to Vitalize Endangered Indigenous Languages." arXiv:2407.12620. Reports on AI tools for Indigenous languages in Brazil, proposing an alternative development cycle based on community engagement.[^37][^36]

Running Wolf, C. & Running Wolf, M. (2020). "IndT5: A Text-to-Text Transformer for 10 Indigenous Languages." AmericasNLP Workshop. The first transformer language model pre-trained for Indigenous languages, with IndCorpus covering ten languages.[^32]

Tonja, A. L., et al. (2024). "InkubaLM: A Small Language Model for Low-Resource African Languages." Demonstrates 0.4B parameter models achieving competitive performance on translation and QA for African low-resource languages.[^33]

4.5 Policy and Institutional Frameworks

United Nations. (2025). "Ensuring Indigenous Peoples' Rights in the Age of AI." Establishes the international policy frame: AI systems reflect biases in training data that exclude Indigenous voices; Indigenous Peoples must play an active role in shaping AI's future.[^39][^38]

UNESCO. (2023). "Indigenous People-Centered Artificial Intelligence: Perspectives from Latin America and the Caribbean." Urges participatory inclusion and appropriate data operation respecting Indigenous autonomy in all phases of AI development.[^40]

Arizona State University Center for Tribal Digital Sovereignty. (2025/2026). Conference proceedings and interviews. Positions tribal nations as sovereign AI innovators and rights-holders, not passive subjects.[^41]

Policy Options / IRPP. (2025). "AI Threatens Indigenous Data Sovereignty and Digital Self-Determination." Canada-Taiwan comparative analysis recommending Indigenous data sovereignty be embedded in national AI strategies.[^42]

TELUS & IEEE. (2025). "Indigenous Involvement in Emerging Technologies." Workshop findings that AI development must reflect distinct First Nations, Inuit, and Métis cultures, with 35% of Indigenous respondents expressing concern.[^43]

4.6 Critical Perspectives

Kepp, S. (2025). "Decolonizing AI Systems: Addressing Power Disparities and Indigenous Knowledge." Frozen Light AI. Analysis of algorithmic colonialism, arguing that Western-optimized AI frameworks fail to accommodate non-linear Indigenous epistemologies.[^46]

Carroll, S. R. (2023). "In Consideration of Indigenous Data Sovereignty: Data Mining as a Colonial Practice." arXiv:2309.10215. Argues for CARE Principles integration in data-reliant technology to protect Indigenous rights.[^48]

Chapman, A., et al. (2025). "Common Rule Revisions to Govern Machine Learning on Indigenous Data." PMC. Calls for revisions to human subjects research governance to address AI/ML's capacity to circumvent Indigenous self-determination.[^47]

Stanford HAI. (2025). "Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts." Documents the digital divide in LLM development: data scarcity, poor quality, and lack of cultural representativeness for low-resource languages.[^35]

References

A New Google AI Research Proposes Deep-Thinking Ratio ... - A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total In...
Measuring LLM Reasoning Effort via Deep-Thinking Tokens - In this work, we introduce deep-thinking ratio (DTR) as a direct measure of inference-time thinking ...
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance - ...can be expensive. Motivated by this, we outline and discuss three types of strategies that users ...
Token-Budget-Aware LLM Reasoning - ACL Anthology - Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen. Findings of the Assoc...
Token-Budget-Aware LLM Reasoning - by T Han · 2025 · Cited by 232 — Such high token costs can lead to signifi- cant expenses, including...
SelfBudgeter: Adaptive Token Allocation for Efficient LLM ...
SelfBudgeter: Adaptive Token Allocation for Efficient LLM ... - However, researchers observed a tendency for reasoning models to overthink simple problems, incurrin...
Indigenous peoples and artificial intelligence: A systematic review and future directions - Maneesha Perera, Rajith Vidanaarachchi, Sangeetha Chandrashekeran, Melissa Kennedy, Brendan Kennedy, Saman Halgamuge, 2025 - This systematic literature review addresses the intersection of two rapidly evolving areas of knowle...
Indigenous peoples and artificial intelligence: A systematic review ... - This systematic literature review addresses the intersection of two rapidly evolving areas of knowle...
Generative Artificial Intelligence and Indigenous ...
The CARE Principles for Indigenous Data Governance - Concerns about secondary use of data and limited opportunities for benefit-sharing have focused atte...
The CARE Principles for Indigenous Data Governance (Carroll et al., 2020)
Operationalizing the CARE and FAIR Principles for Indigenous data futures - As big data, open data, and open science advance to increase access to complex and large datasets fo...
First Nations principles of OCAP - Wikipedia
The First Nations Principles of OCAP® - Standing for ownership, control, access and possession, OCAP® is a tool to support strong informatio...
Ocap Fact Sheet
M=aori algorithmic sovereignty: idea, principles, and use - ...balancing the tension between the opportunities these create, and the inherent risks that these t...
Māori algorithmic sovereignty: idea, principles, and use - ...balancing the tension between the opportunities these create, and the inherent risks that these t...
Indigenous protocol and artificial intelligence position paper - Summary. This position paper on Indigenous Protocol (IP) and Artificial Intelligence (AI) is a start...
Indigenous Protocol and Artificial Intelligence - by JE Lewis · 2020 · Cited by 174 — 2020. Indigenous Protocol and Artificial Intelligence Position P...
INDIGENOUS AI — POSITION PAPER - The position paper on Indigenous Protocol and Artificial Intelligence (IP AI) is a starting place fo...
Abundant Intelligences - INDIGENOUS AI
Towards abundant intelligences: Considerations for Indigenous perspectives in adopting artificial intelligence technology - ...considerations associated with introducing AI into Indigenous healthcare, emphasizing the paramou...
Towards abundant intelligences: Considerations for Indigenous perspectives in adopting artificial intelligence technology - ...considerations associated with introducing AI into Indigenous healthcare, emphasizing the paramou...
AI Reflections: Indigenous Data Sovereignty and Artificial ... - As Indigenous communities consider implementing AI to meet their needs, data sovereignty remains a k...
Reviving Te Reo Māori: Unleashing the Power of AI ... - There is a danger to indigenous languages. According to UNESCO, one out of every two weeks, or almos...
Māori Speech AI Model Helps Preserve and Promote New ... - Te Hiku Media's automatic speech recognition model transcribes te reo Māori with 92% accuracy using ...
STORY: Olin Alum Using AI to Preserve Te Reo Māori
Can small language models revitalize Indigenous ... - Small language models offer a cost-effective and resource-efficient solution for indigenous communit...
LakotaBERT: A Transformer-based Model for Low Resource Lakota Language - ...language modeling accuracy of 51% with a single ground truth assumption, showcasing performance c...
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages - ...pipelines. Although several Transformer models have been introduced to serve many languages, ther...
InkubaLM: A small language model for low-resource African languages - High-resource language models often fall short in the African context, where there is a critical nee...
Indigenous Futures in Artificial Intelligence: From Language ... - Key Takeaways Please enter the url to a YouTube video. Language sovereignty drives innovation: Indig...
Mind the (Language) Gap: Mapping the Challenges of LLM ...
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences - ...2022 we have been exploring application areas and technologies in which Artificial Intelligence (...
Harnessing the Power of Artificial Intelligence to Vitalize ... - In particular, we have been working for the past two years to create AI technologies for Indigenous ...
Ensuring Indigenous Peoples' rights in the age of AI - the United Nationswww.un.org › desa › ensuring-indigenous-peoples’-rights-age-ai - 7 August 2025 - Artificial intelligence (AI) is reshaping our world at an unprecedented pace. For In...
Ensuring Indigenous Peoples' rights in the age of AI - The UN General Assembly adopted a 2024 resolution emphasizing that human rights and fundamental free...
New report and guidelines for indigenous data sovereignty in artificial intelligence developments - Best practices and guidelines for the participatory inclusion of indigenous communities in the artif...
Tribal nations put sovereignty at the center of future with AI | ASU - On Sept. 26, tribal leaders, legal experts and innovators will gather for a one-day pre-conference e...
AI threatens Indigenous data sovereignty and digital self ... - Technology developers should adopt consent-based frameworks for collecting and using Indigenous data...
Prioritizing Indigenous involvement in emerging technology - The workshops revealed that AI development must reflect the distinct cultures, governance structures...
Community-engaged artificial intelligence: an upstream, participatory design, development, testing, validation, use and monitoring framework for artificial intelligence and machine learning models in the Alaska Tribal Health System - ...methods with advancements in artificial intelligence and machine learning (AI/ML) can promote equ...
Coproduction mechanisms to weave Indigenous knowledge, artificial intelligence, and technical data to enable Indigenous-led adaptive decision making: lessons from Australia’s joint managed Kakadu National Park - ... and ecosystem losses. In response, many Indigenous groups are looking for ethical ways to use di...
Decolonizing AI Systems: Addressing Power Disparities and ... - Algorithmic Colonialism in AI Development Algorithmic colonialism is the imposition of Western-centr...
Common Rule Revisions to Govern Machine Learning on Indigenous Data: Implementing the Expectations - ...genomics and other data-centric research: does the Common Rule need revision?" on page 47.

We ag...

In Consideration of Indigenous Data Sovereignty: Data Mining as a Colonial Practice - ...such as artificial intelligence. This research stresses the need for the inclusion of Indigenous ...
Creative data justice: a decolonial and indigenous framework to assess creativity and artificial intelligence - ABSTRACT In the last decade, the Global South has emerged as a significant player in the data econom...
Deploying LLMs on Resource-Constrained Devices - Browse all previoiusly published AI Tutorials here.I write everyday for my readers on actionable AI.
Empirical Guidelines for Deploying LLMs onto Resource ... - by R Qin · 2024 · Cited by 45 — Through extensive experimentation and benchmarking, we draw a number...
Exploring NLP Benchmarks in an Extremely Low-Resource ...
Evaluating LLMs' Effectiveness in Culturally Nuanced, Low ... - by M Ochieng · 2025 · Cited by 18 — The findings emphasize the ne- cessity of continuous improvement...
Ethical Frameworks for AI and Indigenous Knowledge → Scenario - Ethical frameworks for AI and Indigenous knowledge illuminate a transformative path, integrating anc...
Reclaiming Modern Indigenous Economic Intelligence ... - A future built on sovereignty of thought, data self-determination, and economic design grounded in I...