Dataset Training Roadmap
Track our progress on training the OceanLens AI with new marine life datasets. We're building this gradually to ensure the highest identification accuracy.
Behind the Identification Engine
How Our AI Thinks
Identifying blackwater larvae and midnight plankton isn't just a database lookup problem — it requires judgment. Here's how we built that judgment through four iterations.
The Strict Database
Our first approach was simple: the AI searched its database for the closest visual match and returned the answer. Fast and consistent — until the creature didn't exist in the database. Faced with the unknown, the system would confidently map it to the nearest wrong answer. In the vast, underdocumented world of blackwater diving, this was a recurring problem.
The Vision-First AI
We gave the AI permission to override the database whenever its visual analysis disagreed. This helped with unknown larvae — but created a new failure mode we called 'academic aggression.' When the database correctly identified something as Mollusca based on multiple verified references, the AI would spot a translucent wing-like structure and argue it was a ribbon worm instead. Opinionated when it should have been humble.
Rules Over Intuition
V3 introduced a strict priority chain — defining exactly when the database wins and when visual reasoning overrules it. This eliminated both V1 and V2 failure modes for known species. One blind spot remained: when the target was completely absent from the training distribution, the system had no way to know — it still returned the nearest wrong match with some confidence.
The Threshold Guard
V3.5 adds a similarity gate before any rule fires. The system first asks: are any of these candidates close enough to be worth evaluating? If none pass the minimum threshold, the AI enters an unrecognized species mode — it must treat the species as unknown rather than force the nearest wrong answer. The UI surfaces this as 'Uncharted Waters.'
Ready to test the threshold guard?
Upload a photo and watch V3.5 decide — threshold, rules, and all.
AI Database Training Status
Reef & Pelagic
General identification is live, but we are actively curating base data to reduce noise and achieve expert-level accuracy.
Blackwater & Plankton
AI is learning! We need more photo contributions to improve accuracy.
Foundational Taxonomy & Blackwater Focus
Curating a massive foundational dataset of 100,000+ images. By learning from all clear underwater environments (reef, muck, pelagic), the AI builds a strong baseline. However, our ultimate goal and specialized focus remains mastering Blackwater and Pelagic identification, where traditional data is scarce.
Large-Scale Taxa
Larger taxonomic groups with higher species diversity, planned for the next expansion phase.
Dual Mode: Reef & Daylight
A dedicated mode switch that adapts the AI's identification context between blackwater / pelagic and reef environments — tailored for each style of diving.
What's Next
Beyond the Database
Opening the identification engine that powers OceanLens to the world.
Marine Vision API
The identification engine that powers OceanLens — now open to the world. Batch-process thousands of field specimens, integrate deep-sea intelligence into your research platform, or build the next tool for ocean science.
- Batch-process entire collections of lab or field photos in a single API call
- Built for marine biologists, research NGOs, and developers shipping biodiversity tools