AI Training Progress

Dataset Training Roadmap

Track our progress on training the OceanLens AI with new marine life datasets. We're building this gradually to ensure the highest identification accuracy.

Behind the Identification Engine

How Our AI Thinks

Identifying blackwater larvae and midnight plankton isn't just a database lookup problem — it requires judgment. Here's how we built that judgment through four iterations.

V1The Foundation

The Strict Database

Our first approach was simple: the AI searched its database for the closest visual match and returned the answer. Fast and consistent — until the creature didn't exist in the database. Faced with the unknown, the system would confidently map it to the nearest wrong answer. In the vast, underdocumented world of blackwater diving, this was a recurring problem.

Hallucination by omission — confidently wrong when the species was absent from training data.

V2The Overcorrection

The Vision-First AI

We gave the AI permission to override the database whenever its visual analysis disagreed. This helped with unknown larvae — but created a new failure mode we called 'academic aggression.' When the database correctly identified something as Mollusca based on multiple verified references, the AI would spot a translucent wing-like structure and argue it was a ribbon worm instead. Opinionated when it should have been humble.

Overconfident visual reasoning that overruled solid database evidence.

V3The Logic Tree

Rules Over Intuition

V3 introduced a strict priority chain — defining exactly when the database wins and when visual reasoning overrules it. This eliminated both V1 and V2 failure modes for known species. One blind spot remained: when the target was completely absent from the training distribution, the system had no way to know — it still returned the nearest wrong match with some confidence.

🔒

Exact Match Rule When database confidence is high, DB Phylum/Class is treated as factual. AI interprets visual evidence to fit that taxonomy.

🐛

Larval Exception DB says Mollusca → AI may clarify life stage (Veliger, Zoea, etc.) without overriding the Phylum.

🛑

Veto Power Glaring anatomical contradiction → AI must output Unknown. An honest unknown beats a confident wrong answer.

No detection for out-of-distribution images — truly unknown species still received the nearest wrong answer.

V3.5Current System

The Threshold Guard

V3.5 adds a similarity gate before any rule fires. The system first asks: are any of these candidates close enough to be worth evaluating? If none pass the minimum threshold, the AI enters an unrecognized species mode — it must treat the species as unknown rather than force the nearest wrong answer. The UI surfaces this as 'Uncharted Waters.'

🚫

Similarity Gate Candidates below a minimum visual similarity threshold are dropped before any rule logic runs.

📡

Unknown Species Mode If all candidates are dropped, the AI enters a special mode and must output Unknown for Genus/Species — no forced guessing.

🗺️

Uncharted Waters UX Users see a clear 'not in database' state instead of a wrong confident answer.

🧭

Enriched Analysis Output Analysis now includes Diver's Eye (encounter notes & behavior), Photographer's Intel (approach & light tips), and Conservation & Rarity — turning raw taxonomy into field-ready insight for underwater photographers.

Ready to test the threshold guard?

Upload a photo and watch V3.5 decide — threshold, rules, and all.

Try the AI Now→

AI Database Training Status

🐠

Reef & Pelagic

Refining Data

General identification is live, but we are actively curating base data to reduce noise and achieve expert-level accuracy.

Data QualityGetting Smarter

High VolumeMedium Precision

👽

Blackwater & Plankton

Alpha Training

AI is learning! We need more photo contributions to improve accuracy.

Seed Data: 1,257Goal: 2,000

Phase 1 · Active

Foundational Taxonomy & Blackwater Focus

Curating a massive foundational dataset of 100,000+ images. By learning from all clear underwater environments (reef, muck, pelagic), the AI builds a strong baseline. However, our ultimate goal and specialized focus remains mastering Blackwater and Pelagic identification, where traditional data is scarce.

Cnidaria · Ctenophora

Scyphozoa (True Jellyfish)

Imported

Hydrozoa (Hydromedusae / Siphonophores)

Imported

Cubozoa (Box Jellies)

Imported

Ctenophora (Comb Jellies)

Imported

Mollusca

Nudibranchia (Pelagic / Sea Slugs)

Imported

Cephalopoda (Octopus / Squid)

Imported

Pteropoda (Sea Butterflies)

Imported

Arthropoda · Annelida

Amphipoda

Imported

Stomatopoda (Mantis Shrimp Larvae)

Imported

Polychaeta (Bristle Worms)

Imported

Chordata

Pelagic Decapoda

Imported

Micro-organisms

Thaliacea (Salps / Doliolids)

Imported

Radiolaria

Imported

Chaetognatha (Arrow Worms)

Imported

Foraminifera

Imported

Phase 2 · Planned

Large-Scale Taxa

Larger taxonomic groups with higher species diversity, planned for the next expansion phase.

Actinopterygii

Imported

Chondrichthyes (Sharks / Rays / Skates)

Imported

Blackwater Fish

Imported

Decapoda

Imported

Isopoda

Queued

Phase 3 · Future

Dual Mode: Reef & Daylight

A dedicated mode switch that adapts the AI's identification context between blackwater / pelagic and reef environments — tailored for each style of diving.

UI Switch: Blackwater / Reef Mode

Queued

What's Next

Beyond the Database

Opening the identification engine that powers OceanLens to the world.

Coming Soon

For Researchers & Developers

Marine Vision API

The identification engine that powers OceanLens — now open to the world. Batch-process thousands of field specimens, integrate deep-sea intelligence into your research platform, or build the next tool for ocean science.

Batch-process entire collections of lab or field photos in a single API call
Built for marine biologists, research NGOs, and developers shipping biodiversity tools

OceanLens · Marine Intelligence Platform