MongoDB Agent Skills.. wow.
How agentic AI and MongoDB Agent Skills are changing the way developers work with Atlas Stream Processing
I remember it distinctly. Studying for the sqlplus module of my Oracle 8i OCP exam, circa 2000. The dot-com era was on fire, Oracle was the most common backend database on the planet, and I was completely enthralled with it. To get certified you had to learn sqlplus top to bottom, every nuance, every flag. And I did. And I passed.
I sit here now in a Claude Code session thinking back on those days and how far we have come.
Back then, and really until just recently, querying a database meant writing SQL or learning a built-in DSL. The MongoDB query language and aggregation pipeline are genuinely expressive and powerful, but they are still a declarative language you have to learn, remember, and get exactly right. One wrong brace and you are staring at a parse error at 11pm. We have all been there.
Those days are done.
The agentic shift
The rise of the agentic coding agent, think Claude Code, Codex, and others, has completely changed the game. We used to dream of an English-to-SQL paradigm where you could just ask for data in plain language and get a query back. That felt revolutionary at the time.
This is so much more than that.
These agents are beyond what I would have imagined even a year ago, and certainly 26 years ago when I was poking at every sqlplus option and editing SQL in vi on Solaris. These agents work at the level of outcomes. Tell it what you need, and it figures out the syntax, the structure, and the execution. No syntax errors. No Stack Overflow rabbit holes (no manuals in 2000, gasp). Just ask for what you need.
MongoDB Agent Skills
With the launch of MongoDB Agent Skills, you now have deep control of your MongoDB environment directly from your agentic workflow. We are talking connection pool management, data modeling guidance, query optimization, natural language querying, Search index creation, and yes, full Atlas Stream Processing pipeline development.
Part developer. Part DBA. Part DevOps. A whole team at your fingertips.
Let me walk through a few examples of what this actually looks like in practice. I am going to focus on Atlas Stream Processing because, well, that is my product area. But the same interaction model applies across the board: query optimization, data modeling, index management, connection pooling, and more. Pick your area, the agent has it covered.
Example 1: Creating and starting a stream processor
Here is a full session. I start by checking what processors are already running, then ask it to build a new one from scratch, then iterate on the pipeline live.
❯ what stream processors do I have running?
Called mongodb (ctrl+o to expand)
Two processors are currently STARTED (billing active):
┌───────────────────────┬─────────┐
│ Processor │ State │
├───────────────────────┼─────────┤
│ solar_basic_processor │ STARTED │
├───────────────────────┼─────────┤
│ solar_power_manager │ STARTED │
└───────────────────────┴─────────┘
Also worth noting: airplanes_processor is in FAILED state — worth
investigating if that's unexpected.
❯ create a stream processor that reads from sample stream solar and writes
to a collection called test_out12. Show me a few documents.
⏺ I'll create the processor, start it, then query for sample docs.
Let me load the build tool.
Called mongodb
⏺ Processor is running. Let me give it a moment to land some docs, then query.
Called mongodb
⏺ Need to connect to the cluster first. Let me find it.
Called mongodb
⏺ Two clusters: MyCluster01 and MyCluster02. Let me check which one maps
to the right connection.
Called mongodb 3 times
⏺ Processor solar_test_out12 is running and writing to test.test_out12.
Here are 5 sample docs (66 already collected):
┌───────────┬───────────┬──────────┬──────────────────────┐
│ device_id │ obs.watts │ obs.temp │ timestamp │
├───────────┼───────────┼──────────┼──────────────────────┤
│ device_0 │ 135 │ 5 │ 2026-04-20T16:13:09Z │
├───────────┼───────────┼──────────┼──────────────────────┤
│ device_3 │ 428 │ 10 │ 2026-04-20T16:13:09Z │
├───────────┼───────────┼──────────┼──────────────────────┤
│ device_1 │ 1 │ 24 │ 2026-04-20T16:13:10Z │
├───────────┼───────────┼──────────┼──────────────────────┤
│ device_6 │ 63 │ 22 │ 2026-04-20T16:13:10Z │
├───────────┼───────────┼──────────┼──────────────────────┤
│ device_3 │ 305 │ 24 │ 2026-04-20T16:13:11Z │
└───────────┴───────────┴──────────┴──────────────────────┘
Each doc includes device_id, group_id, max_watts, obs.watts, obs.temp,
and a _stream_meta field from the sample stream.
❯ let's compute amps using obs.watts and a static key 'volts' = 48 for all
panels then apply electric power formula to calculate amps. add that as a stage.
⏺ Stopping, inserting $addFields with volts: 48 and amps = obs.watts / 48,
then restarting.
Called mongodb 3 times
⏺ Done. Pipeline is now:
1. $source — sample_stream_solar
2. $addFields — volts: 48, amps: obs.watts / 48 ← new
3. $merge — test.test_out12
New docs will have e.g. obs.watts: 135 → amps: 2.8125.
❯ show me the pipeline
Called mongodb
⏺ [
{
"$source": {
"connectionName": "sample_stream_solar"
}
},
{
"$addFields": {
"volts": 48,
"amps": { "$divide": ["$obs.watts", 48] }
}
},
{
"$merge": {
"into": {
"connectionName": "MyCluster",
"db": "test",
"coll": "test_out12"
}
}
}
]
Running on SP5, 90 messages processed, p50 latency ~7ms.
That whole session, create, start, query, iterate on the pipeline, restart, took about two minutes and zero trips to the docs. Yes, a new era of working with computers for sure.
Example 2: Monitoring and tuning
This is where it gets interesting. I asked for a full status report, then followed up with a sizing question for production. The agent caught a duplicate pipeline stage and a JS function I should have replaced with native MQL, and fixed both.
❯ show me the status of the solar stream processors including latency
and documents processed per sec.
⏺ Fetching stats for all solar processors in parallel.
Called mongodb 6 times (ctrl+o to expand)
⏺ Here's the status of all 6 solar processors:
Running (billing active)
┌───────────────────────┬─────────┬──────┬─────────┬──────────┬───────────┬─────────────┬─────────────┐
│ Processor │ State │ Tier │ Docs In │ Docs Out │ Docs/sec │ p50 Latency │ p99 Latency │
├───────────────────────┼─────────┼──────┼─────────┼──────────┼───────────┼─────────────┼─────────────┤
│ solar_basic_processor │ STARTED │ SP2 │ 509,614 │ 509,614 │ ~2.0/s │ 7.7ms │ 57.0ms │
├───────────────────────┼─────────┼──────┼─────────┼──────────┼───────────┼─────────────┼─────────────┤
│ solar_power_manager │ STARTED │ SP30 │ 492,872 │ 323,047* │ ~2.0/s in │ 8.6ms │ 68.1ms │
├───────────────────────┼─────────┼──────┼─────────┼──────────┼───────────┼─────────────┼─────────────┤
│ solar_test_out12 │ STARTED │ SP5 │ 2,874 │ 2,874 │ ~2.0/s │ 7.4ms │ 20.5ms │
└───────────────────────┴─────────┴──────┴─────────┴──────────┴───────────┴─────────────┴─────────────┘
* solar_power_manager drops ~34% of docs due to the $match: obs.watts > 100 filter.
Not running
┌─────────────────────────────┬─────────┬──────┬─────────────────────┐
│ Processor │ State │ Tier │ Notes │
├─────────────────────────────┼─────────┼──────┼─────────────────────┤
│ solar_boosted_processor │ CREATED │ SP30 │ Never started │
├─────────────────────────────┼─────────┼──────┼─────────────────────┤
│ solar_efficiency_view │ CREATED │ SP30 │ Never started │
├─────────────────────────────┼─────────┼──────┼─────────────────────┤
│ solar_watts_boost_processor │ STOPPED │ SP30 │ Last ran 2026-03-02 │
└─────────────────────────────┴─────────┴──────┴─────────────────────┘
The sample stream produces ~2 docs/sec. solar_power_manager has the
highest p99 latency (68ms) likely due to the JS $function for amps
calculation — that's expected overhead vs. native MQL operators.
✻ Brewed for 30s
❯ are the running processors on the right tiers for optimal performance?
⏺ Based on the stats (~2 docs/sec from the sample stream, low memory
usage, no DLQ errors):
solar_basic_processor — SP2 ✅
Simple $source → $merge, 2 docs/sec, p99 57ms. SP2 is the right fit.
solar_test_out12 — SP5 ✅
Simple $source → $addFields → $merge, p99 only 20.5ms. SP5 is fine;
SP2 would also work here but not worth changing.
solar_power_manager — SP30 ⚠️ Over-provisioned
This was bumped to SP30 earlier in this session, but at 2 docs/sec
it's significant overkill. The JS $function for amps adds CPU overhead,
but nothing that SP10 couldn't handle. SP30 is designed for
high-throughput workloads (hundreds to thousands of docs/sec). You're
paying roughly 3x the cost of SP10 for no benefit at this volume.
Recommendation: Drop solar_power_manager back to SP10. That means
stop → start with tier: SP10. Want me to do that?
Also worth noting: solar_boosted_processor and solar_efficiency_view
are both CREATED on SP30 — if you start those, you'd have three SP30
processors running simultaneously on ~2 docs/sec total. Consider
setting those to SP10 before starting them.
❯ we plan the workload to be about 12000 docs/sec on solar_power_manager
when we go production, is it sized for that?
⏺ Let me think through this carefully based on the current pipeline
and expected load.
Pipeline complexity factors for solar_power_manager:
- $match — cheap, filters ~34% of docs → ~7,900 docs/sec reach subsequent stages
- $addFields (power, volts) — native, fast
- $addFields (power, volts) — duplicate stage, doing redundant work
- $addFields with JS $function for amps — this is the bottleneck.
JS execution is significantly slower than native MQL operators
- $merge with parallelism: 8, whenMatched: replace — reasonable for
high write throughput
SP30 can handle ~30K docs/sec on simple pipelines, but the JS $function
reduces effective throughput significantly — roughly 40-60% depending on
function complexity. That puts real capacity closer to ~15K-18K docs/sec
for this pipeline. At 12K in, ~7.9K actually hit the JS stage, so SP30
should technically handle it, but you'll have little headroom for spikes.
Two changes would meaningfully improve this before go-live:
1. Replace the JS $function with native $divide — you already did this
in solar_test_out12. Native operators are 10-50x faster than JS.
This alone could drop the required tier to SP10.
2. Remove the duplicate $addFields (power, volts) — there are two
identical stages after $match. One is redundant.
With those changes, SP30 would have ample headroom at 12K docs/sec.
Without them, I'd recommend SP50 to be safe in production.
Want me to apply both fixes now?
✻ Brewed for 35s
❯ yes
⏺ Stopping, then applying both fixes: remove duplicate $addFields and
replace JS $function with native $divide.
Called mongodb 3 times (ctrl+o to expand)
⏺ Done. Updated pipeline:
1. $source — sample_stream_solar
2. $addFields — power: "DC", volts: 40
3. $match — obs.watts > 100
4. $addFields — amps: { $divide: ["$obs.watts", 40] } ← native, no JS
5. $merge — parallelism: 8, whenMatched: replace
With JS removed and the duplicate stage gone, SP30 should comfortably
handle 12K docs/sec with solid headroom for spikes.
✻ Crunched for 1m 11s
❯ show me the before and after pipelines
⏺ Before
[
{ $source: { connectionName: 'sample_stream_solar',
timeField: { $dateFromString: { dateString: '$timestamp' } } } },
{ $addFields: { power: 'DC', volts: 40 } },
{ $match: { 'obs.watts': { $gt: 100 } } },
{ $addFields: { power: 'DC', volts: 40 } }, // ← duplicate
{ $addFields: { amps: { $function: { // ← JS function
args: ['$obs.watts', '$volts'],
body: 'function(watts, volts) { return watts / volts; }',
lang: 'js'
} } } },
{ $merge: { into: { connectionName: 'MyCluster', db: 'test',
coll: 'test99_auto' }, parallelism: 8, whenMatched: 'replace' } }
]
After
[
{ $source: { connectionName: 'sample_stream_solar',
timeField: { $dateFromString: { dateString: '$timestamp' } } } },
{ $addFields: { power: 'DC', volts: 40 } },
{ $match: { 'obs.watts': { $gt: 100 } } },
{ $addFields: { amps: { $divide: ['$obs.watts', 40] } } }, // ← native MQL
{ $merge: { into: { connectionName: 'MyCluster', db: 'test',
coll: 'test99_auto' }, parallelism: 8, whenMatched: 'replace' } }
]
Two changes: removed the duplicate $addFields and replaced the JS
$function with native $divide — 5 stages down from 6, no JS overhead.
A duplicate stage and a slow JS function, both caught before go-live. That used to be a war story you’d tell after a painful production incident.
Summary
Twenty-six years ago I was grinding through Oracle certification exams, memorizing every flag in sqlplus, editing SQL in vi on Solaris. The idea that I’d one day have a conversation with a computer that could inspect my running stream processors, catch a duplicate pipeline stage, identify a JS performance anti-pattern, and fix all of it in plain English would have sounded like science fiction. It is not science fiction. It is a Tuesday in April 2026.
LLMs and agentic workflows have crossed a threshold that matters. A system that understands intent, knows your infrastructure, reasons about tradeoffs, and acts on them — that is a different category of tool. For data engineers and developers working with MongoDB, Agent Skills puts that capability directly into your workflow.
Getting started is straightforward. Install the MongoDB MCP server, connect it to your agent of choice, and point it at your Atlas cluster. From there your agent has full visibility into your environment and can create, monitor, tune, and iterate on Atlas Stream Processing pipelines in plain English.
I keep thinking back to that sqlplus exam. All those hours learning syntax that a machine now handles in seconds. Some of it feels like wasted time in hindsight. But honestly, understanding the fundamentals made it possible to know when the agent is right and when to push back. That part does not change. The rest? Fair game.
Wow.