Bits & Orbits Weekly

Issue 4: Agentic AI for EO is becoming productive

Last week the orbital data center category was defined in filings and partnerships; this week it showed up as hardware, and the AI4Space side answered with the scaffolding that EO models will be built, governed, and measured against.

On the Space4AI side, SpaceX took the wraps off AI1, its first dedicated orbital AI-compute satellite, in a pre-IPO technical update from Bastrop. Astronomers' interference concerns opened a new front of opposition, and a multi-million-dollar pre-seed round for a distributed inference constellation shows the category now has room for challengers as well as incumbents.

On the AI4Space side, the week reads as a stack filling in. Tilebox launched verifiable AI workflows for Earth observation, India's IN-SPACe put about $2.6 million behind a sovereign EO foundation model, and a new arXiv benchmark, TerraBench, proposes how to evaluate agents reasoning across gridded physical data, imagery, and simulator outputs — and finds today's best models clear only part of the bar.

The hardware going up and the models being trained are converging on the same workload — agents that reason over Earth data from orbit.

Specifics below.

Space4AI

SpaceX unveils AI1, its first orbital AI-compute satellite

On June 9, 2026, SpaceX unveiled AI1, its first dedicated orbital AI-compute satellite, in a technical update delivered from its Bastrop, Texas factory ahead of the company's planned IPO. Elon Musk and engineer Ian Dahl presented the design as a solar-powered computing node that radiates waste heat to space for cooling.

The pitch leans on reuse rather than a clean-sheet bus: SpaceX says AI1 carries over much of the existing Starlink V3 technology, with meaningful production volumes targeted by the end of 2027 out of the same Bastrop line. Framing the satellite as an orbital computing node — power from the sun, heat rejection to space, comms inherited from Starlink — positions the program as an incremental extension of an already-flying platform rather than a new vehicle class.

Orbital data centers gain momentum as SpaceX prepares launches and astronomers warn of interference

Following the AI1 unveiling, SpaceNews reported on June 12 that SpaceX is preparing to begin launching orbital data center spacecraft as soon as next year, drawing warnings from astronomers that the satellites could cause serious interference with ground-based observations.

The reaction lands alongside a broader industry stocktake. At a recent SpaceNews event dedicated to orbital data centers — covered in the last Bits & Orbits issue — journalists, industry leaders, and analysts mapped what is driving commercial interest in moving compute to orbit and what the next phase of deployments looks like.

For astronomers, the concern is concrete: SpaceX's planned 2027 launches would add another class of bright, reflective hardware to low Earth orbit at a moment when the optical and radio observation communities are already contending with large constellations.

Orbital raises $5M pre-seed to build distributed ODC constellation for AI inference

On June 10, Orbital announced a $5M pre-seed round to build an orbital data center constellation for AI inference workloads.

Rather than concentrating compute on a single large platform, Orbital's architecture distributes the workload across a network of smaller satellites that operate together to handle AI inference at scale. The company says the distributed topology is designed to keep the constellation in constant contact with the ground.

AI4Space

Tilebox launches verifiable AI workflows for Earth observation data

On June 11, Tilebox launched what it calls verifiable AI workflows for Earth observation — an agent-facing layer built on top of the metadata catalog and workflow orchestration the company already runs. The release targets a specific failure mode: AI over satellite data tends to behave like a black box, where an analyst submits a prompt, gets an answer back, and has no way to see how it was reached. Tilebox's update lets an AI agent discover the right satellite data, run the processing where the data already lives, and leave behind an inspectable record of exactly which datasets and which steps produced a result.

The practical payoff is traceability: an agent's output can be reproduced and fact-checked rather than taken on trust — the agent "shows its work" instead of stitching together a plausible-sounding answer from fragmented sources. Tilebox is aiming this at teams moving EO from experimentation into operational use in defense, infrastructure, climate, insurance, and energy, where buyers need results they can audit and repeat across cloud, on-premise, and sovereign environments. The new capabilities are available for free through Tilebox Labs.

SatSure wins $2.6 million IN-SPACe TAF grant to build Dhaarini Earth-observation foundation models

On June 11, SatSure announced it had won a $2.57 million grant from IN-SPACe under the Technology Adoption Fund to build Dhaarini, described as India's first sovereign Earth intelligence backbone.

The program is structured around sovereign Large Earth Observation Models and Vision Foundation Models for India, positioning Dhaarini as the country's first publicly funded sovereign geospatial foundation-AI effort. The award routes through IN-SPACe's Technology Adoption Fund, the regulator's vehicle for co-financing private space-tech development, and is one of three startup grants in the fund's first cohort. SatSure says the models will be trained on satellite and drone data tailored to India — monsoon patterns, agricultural landscapes, urban expansion — to improve accuracy over global models that struggle with local conditions.

TerraBench paper introduces benchmark for agents reasoning over heterogeneous Earth-system data

On June 12, researchers posted TerraBench — a benchmark paired with an executable agent framework, TerraAgent — for testing whether AI agents can reason across heterogeneous Earth-system inputs: gridded physical data, satellite imagery, geospatial context, simulator outputs, and document-grounded evidence. The benchmark holds 403 human-verified items spanning three tracks and eight application domains, wired to 77 specialized sub-tools. Most items demand genuine multi-step, multi-tool workflows rather than one-shot question answering — many task traces run 50–60 steps and draw on five or six distinct tool groups.

What sharpens the test is how it scores: TerraBench grades not just whether an agent selects the right tools but whether its final numbers land within tolerance, pairing process-level tool-use metrics with tolerance-aware numeric grading rather than leaning on an LLM-as-judge. On that bar, today's strongest models fall short. The top frontier model, Claude Sonnet 4.6, reaches only 59.2 on tool use and 22.9 on tolerance-bounded answer accuracy (Hit@tol), while the best open-weight model, Qwen3.5-35B, trails at 40.0 and 5.9. The shortfall comes less from picking the wrong tool than from grounding failures — feeding tools the wrong arguments and producing numbers that miss — with more than 84% of numerical answers falling outside acceptable error margins across every model tested. The authors' takeaway: reliable Earth-science agents will need to coordinate heterogeneous workflows, parameterize tools precisely, and preserve provenance; tool access alone is not enough.

Till next time,

Meta-beat Column of this week

Read also about the AI Pipeline that sits at the core, producing this Newsletter, including its ups and downs of this week:

Behind the pipeline — Issue 004 Meta-beat

Bits & Orbits Weekly - Issue 4