Australia's Missing GPT: Inside Kangaroo LLM and the Local AI Gap

State of play: no flagship local LLM

Australia has not yet produced a flagship, globally competitive large language model (LLM) comparable to GPT-4, Claude 3.5, or LLaMA 3.1. Research groups, companies and government bodies in Australia largely rely on international LLMs. These models are widely used, but they show measurable gaps when handling Australian English, local slang, cultural references and legal or regulatory context.

Kangaroo LLM: scope, partners and current status

Kangaroo LLM is the most prominent domestic initiative aiming to build an open-source model tailored to Australian English and cultural nuance. It is led by a nonprofit consortium with partners including Katonic AI, RackCorp, NEXTDC, Hitachi Vantara and Hewlett Packard Enterprise. The project aims to prioritize data sovereignty and local alignment, but as of August 2025 it remains in early stages:

Data collection: The project has identified 4.2 million Australian websites as potential sources, with an initial focus on 754,000 sites. A large-scale crawl was planned but delayed in late 2024 over legal and privacy concerns.
Technical pipeline: Data ingestion uses a "Kangaroo Bot" crawler that respects robots.txt and site opt-outs. Collected content is processed into a so-called "VegeMighty Dataset" and refined through a "Great Barrier Reef Pipeline" intended for model training. The model architecture, weight releases, training methodology and benchmarks have not been published.
Governance and resourcing: Kangaroo operates as a nonprofit effort with roughly 100 volunteers and about 10 full-time equivalent contributors. Funding has been sought from corporate clients and government grants, but no major public or private investment has been confirmed.

Taken together, Kangaroo LLM is an important step toward AI sovereignty in Australia, but it is not yet a technical alternative to established global models. Its success will depend on sustained funding, legal clarity, technical execution and adoption by local developers and enterprises.

How international LLMs are used in Australia

Claude 3.5 Sonnet (Anthropic), GPT-4 (OpenAI) and Meta's LLaMA 2 are widely accessible and actively used across Australian research, government and industry. Their adoption is supported by cloud availability (AWS, Azure, Google Cloud) and integration into enterprise workflows.

A notable change came in February 2025, when Claude 3.5 Sonnet became available in AWS's Sydney region, enabling regional data residency options. Australian teams use these models for tasks from customer service automation to scientific research. Many deployments rely on fine-tuning or adaptation to local datasets to improve relevance.

Case study: the University of Sydney used Claude to analyse whale acoustic data, achieving 89.4% accuracy in detecting minke whales versus 76.5% for traditional methods. This example shows how global models can be repurposed effectively for local science, while also underlining Australia’s dependence on external providers.

Academic strengths: evaluation, fairness and adaptation

Australian universities and research organisations are active in LLM-related work, but generally focus on evaluation, fairness, domain adaptation and specialized applications rather than on creating new foundational architectures.

UNSW's BESSTIE benchmark offers systematic evaluation for sentiment and sarcasm across Australian, British and Indian English. It shows that global LLMs underperform on Australian English, particularly for sarcasm detection (F-score 0.59 on Reddit versus 0.81 for sentiment).
Macquarie University researchers have fine-tuned BERT-family models (BioBERT, ALBERT) for biomedical QA, achieving high scores in competitions and demonstrating strength in domain-specific adaptation.
CSIRO Data61 publishes practical research on agent-based systems with LLMs, privacy-preserving AI and model risk management, focusing on policy and applied research rather than on foundational model building.
The University of Adelaide and CommBank partnership (the CommBank Centre for Foundational AI) aims to advance machine learning in financial services but emphasises applied research and fine-tuning rather than building large general-purpose LLMs from scratch.

Policy, investment and infrastructure constraints

Australia has active policy development and growing investment in AI, but gaps remain in sovereign compute and a commercial ecosystem for training large-scale LLMs:

Policy: A risk-based AI policy framework mandates transparency, testing and accountability for high-risk applications. Privacy law reforms in 2024 added new requirements for AI transparency that affect model choice and deployment.
Investment: Venture capital investment in Australian AI startups reached AUD 1.3 billion in 2024, with AI representing a large share of deals in early 2025. Most funding targets application-layer companies rather than foundational model R&D.
Infrastructure: Australia lacks large-scale domestic computational infrastructure for training general-purpose LLMs. Training and inference at scale typically rely on international cloud providers, although some capabilities are emerging in regional cloud services (for example, AWS's Sydney region supporting Claude 3.5 Sonnet).

What this means for AI sovereignty

The Australian ecosystem shows strong capabilities in adapting and evaluating LLMs, and in building domain-specific solutions. However, creating a sovereign, large-scale foundational model requires major, coordinated investment in compute, talent and data governance. Kangaroo LLM represents a symbolic and practical attempt to close that gap, but as of August 2025 it remains an early-stage effort facing legal, technical and resourcing hurdles.

Until a trained, benchmarked and publicly available local model appears, Australian organisations will continue to rely on international LLMs while working to mitigate local shortcomings through fine-tuning, benchmarks and policy-driven safeguards.

Australia's Missing GPT: Inside Kangaroo LLM and the Local AI Gap

Сменить язык