MongoDB stores data as flexible BSON documents. Use insertOne/find/updateOne/deleteOne for CRUD, the aggregation pipeline ($match, $group, $lookup) for analytics, compound indexes following the ESR rule for performance, and embed-vs-reference decisions for schema design. Multi-document transactions provide ACID guarantees since v4.0. Use explain() and the database profiler to optimize queries.
1. The Document Model and BSON Types
MongoDB stores data as documents — flexible, JSON-like structures that map naturally to objects in most programming languages. Unlike relational databases where you define rigid table schemas upfront, MongoDB documents in the same collection can have different fields. Internally documents use BSON (Binary JSON), extending JSON with types like Date, ObjectId,Decimal128, and BinData. Each document has a maximum size of 16 MB.
A collection is analogous to a table, and a database holds multiple collections. The _id field is required and automatically generated as an ObjectId if not provided. ObjectId is a 12-byte value containing a 4-byte timestamp, 5-byte random value, and 3-byte counter — making it sortable by creation time.
// A MongoDB document — JSON-like but with rich types
{
_id: ObjectId("507f1f77bcf86cd799439011"), // 12-byte unique ID
name: "Alice Johnson", // String (UTF-8)
age: 29, // Int32
balance: NumberDecimal("1249.99"), // Decimal128
tags: ["developer", "speaker"], // Array
address: { // Embedded document
city: "San Francisco",
state: "CA",
zip: "94102"
},
createdAt: ISODate("2026-01-15T09:00:00Z"), // UTC datetime
metadata: BinData(0, "c3VyZQ==") // Binary data
}| BSON Type | Example | Use Case |
|---|---|---|
| ObjectId | ObjectId("507f...") | Default _id, embeds timestamp |
| String | "hello" | UTF-8 text |
| Int32 / Int64 | 42 / NumberLong(42) | Integer values |
| Decimal128 | NumberDecimal("9.99") | Financial / exact decimal |
| Date | ISODate("2026-01-15") | Timestamps |
| Array | [1, 2, 3] | Lists, multikey indexes |
| Boolean | true / false | Flags |
2. CRUD Operations
Every data interaction maps to Create, Read, Update, or Delete. All operations accept a filter document — a JSON-like object describing which documents to target. MongoDB provides both single-document and multi-document variants of each operation. Single-document operations are atomic; multi-document operations process each document independently unless wrapped in a transaction.
Create — insertOne & insertMany
// Insert a single document — returns insertedId
db.users.insertOne({
name: "Alice",
email: "alice@example.com",
role: "admin",
createdAt: new Date()
});
// Insert multiple documents
// ordered: false = continue inserting on duplicate key errors
db.users.insertMany([
{ name: "Bob", email: "bob@example.com", role: "user" },
{ name: "Carol", email: "carol@example.com", role: "editor" }
], { ordered: false });Read — find & findOne
// Find with query operators, projection, sort, limit
db.users.find({
role: { $in: ["admin", "editor"] },
createdAt: { $gte: ISODate("2026-01-01") }
}).project({ name: 1, email: 1, _id: 0 })
.sort({ createdAt: -1 }).limit(20);
// Find one document by exact match
db.users.findOne({ email: "alice@example.com" });
// Nested field query
db.users.find({ "address.city": "San Francisco" });
// Array query with $elemMatch
db.orders.find({ items: { $elemMatch: { qty: { $gt: 5 }, price: { $lt: 20 } } } });
// Query operators cheat sheet:
// $eq $ne $gt $gte $lt $lte — comparison
// $in $nin — set membership
// $and $or $not $nor — logical
// $exists $type — element
// $regex $text — string matching
// $elemMatch $size $all — arrayUpdate — updateOne & updateMany
// Update with multiple operators
db.users.updateOne(
{ email: "alice@example.com" },
{
$set: { role: "superadmin" },
$inc: { loginCount: 1 },
$push: { tags: "verified" },
$currentDate: { lastModified: true }
}
);
// Upsert — insert if not found
db.metrics.updateOne(
{ date: "2026-02-28", page: "/home" },
{ $inc: { views: 1 } },
{ upsert: true }
);
// Array update operators
db.posts.updateOne(
{ _id: postId },
{ $addToSet: { likes: userId }, // add only if not present
$pull: { dislikes: userId } } // remove from array
);
// Update operators:
// $set, $unset, $rename — field
// $inc, $mul, $min, $max — numeric
// $push, $pull, $addToSet, $pop — arrayDelete — deleteOne & deleteMany
db.users.deleteOne({ email: "bob@example.com" });
db.sessions.deleteMany({
lastAccess: { $lt: ISODate("2025-01-01") }
});
// findOneAndDelete — returns the deleted document
const deleted = db.queue.findOneAndDelete(
{ status: "pending" },
{ sort: { priority: -1 } }
);3. Aggregation Pipeline
The aggregation pipeline processes documents through sequential stages. Each stage transforms documents as they pass through — like a Unix pipe. Place $match early to filter documents before expensive stages, and use $project to reduce document size mid-pipeline. The pipeline can use indexes for $match and $sort stages at the beginning.
// Revenue report by category — last 30 days
db.orders.aggregate([
{ $match: {
status: "completed",
orderDate: { $gte: ISODate("2026-01-29") }
}},
{ $lookup: {
from: "products", localField: "productId",
foreignField: "_id", as: "product"
}},
{ $unwind: "$product" },
{ $group: {
_id: "$product.category",
totalRevenue: { $sum: "$amount" },
avgOrder: { $avg: "$amount" },
count: { $sum: 1 }
}},
{ $project: {
category: "$_id", totalRevenue: { $round: ["$totalRevenue", 2] },
avgOrder: { $round: ["$avgOrder", 2] }, count: 1, _id: 0
}},
{ $sort: { totalRevenue: -1 } }
]);| Stage | Purpose | SQL Equivalent |
|---|---|---|
| $match | Filter documents | WHERE |
| $group | Aggregate values | GROUP BY |
| $project | Reshape fields | SELECT |
| $lookup | Left outer join | LEFT JOIN |
| $unwind | Flatten array to one doc per element | UNNEST |
| $sort | Order results | ORDER BY |
| $facet | Multiple pipelines in parallel | Multiple queries |
| $addFields | Add computed fields | SELECT expr AS alias |
| $bucket | Group into value ranges | CASE WHEN + GROUP BY |
$facet — Multiple Aggregations in One Query
// Get both paginated results and total count in one query
db.products.aggregate([
{ $match: { category: "electronics" } },
{ $facet: {
results: [
{ $sort: { price: -1 } },
{ $skip: 20 },
{ $limit: 10 },
{ $project: { name: 1, price: 1 } }
],
totalCount: [
{ $count: "count" }
],
priceRange: [
{ $group: {
_id: null,
minPrice: { $min: "$price" },
maxPrice: { $max: "$price" }
}}
]
}}
]);4. Indexing Strategies
Indexes are the most critical factor for query performance. Without an index, every query performs a collection scan (COLLSCAN) — reading every document. Follow the ESR rule: Equality fields first, Sort fields next, Range fields last. Usedb.collection.getIndexes() to list existing indexes and db.collection.totalIndexSize() to check disk usage. Over-indexing wastes storage and slows writes, so create only the indexes your queries need.
Compound and Single-Field Indexes
// Single-field index with unique constraint
db.users.createIndex({ email: 1 }, { unique: true });
// Compound index — ESR rule
db.users.createIndex({ status: 1, name: 1, createdAt: 1 });
// Partial index — smaller and faster
db.users.createIndex(
{ email: 1 },
{ partialFilterExpression: { status: "active" } }
);
// Covered query — index has all fields, no doc fetch
db.users.createIndex({ email: 1, name: 1 });
db.users.find({ email: "alice@example.com" }, { name: 1, email: 1, _id: 0 });Specialized Index Types
// Text index — full-text search
db.articles.createIndex(
{ title: "text", body: "text" },
{ weights: { title: 10, body: 1 } }
);
db.articles.find({ $text: { $search: "mongodb aggregation" } });
// TTL index — auto-expire documents after 7 days
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 604800 });
// 2dsphere geospatial index
db.places.createIndex({ location: "2dsphere" });
db.places.find({
location: { $near: {
$geometry: { type: "Point", coordinates: [-122.4, 37.8] },
$maxDistance: 5000
}}
});
// Wildcard index — flexible schema
db.events.createIndex({ "metadata.$**": 1 });| Index Type | Best For | Limitation |
|---|---|---|
| Single-field | Simple equality / range | One field only |
| Compound | Multi-field queries (ESR) | Max 32 fields; order matters |
| Multikey | Array fields | Max 1 array per compound index |
| Text | Full-text search | One text index per collection |
| 2dsphere | Geospatial queries | GeoJSON format required |
| TTL | Auto-expire documents | Single Date field only |
| Hashed | Hash-based sharding | Equality only, no range |
| Wildcard | Dynamic schemas | No compound queries |
5. Schema Design Patterns
MongoDB schema design is driven by how your application queries data, not by normalization rules. Unlike relational databases where you normalize to third normal form and join at query time, MongoDB encourages denormalization to reduce the need for joins. The key decision is whether to embed related data inside a document or reference it via an ObjectId in another collection. Consider your read/write ratio, data access patterns, and the expected document size (max 16 MB).
| Criteria | Embed | Reference |
|---|---|---|
| Relationship | One-to-few (1-5) | One-to-many, many-to-many |
| Read pattern | Always read together | Read independently |
| Update pattern | Rarely updated alone | Updated independently |
| Data size | Small sub-document | Large or growing unbounded |
| Doc limit | Under 16 MB total | Could exceed 16 MB |
// EMBEDDED: Blog post with comments (one-to-few)
{
_id: ObjectId("..."), title: "Schema Design",
comments: [
{ user: "Bob", text: "Great!", date: ISODate("2026-02-01") },
{ user: "Carol", text: "Helpful", date: ISODate("2026-02-02") }
]
}
// REFERENCED: Orders referencing products (one-to-many)
{
_id: ObjectId("..."), userId: ObjectId("..."),
items: [
{ productId: ObjectId("..."), qty: 2, price: 29.99 },
{ productId: ObjectId("..."), qty: 1, price: 49.99 }
], total: 109.97
}Common Patterns: Bucket, Polymorphic, Computed
// BUCKET — time-series data batched into hourly/daily buckets
{
sensorId: "temp-001", date: ISODate("2026-02-28"),
count: 60,
measurements: [
{ ts: ISODate("...T00:00:00Z"), value: 22.1 },
{ ts: ISODate("...T00:01:00Z"), value: 22.3 }
],
summary: { min: 21.5, max: 23.1, avg: 22.2 }
}
// POLYMORPHIC — different shapes in same collection
{ type: "car", make: "Toyota", doors: 4, mpg: 32 }
{ type: "truck", make: "Ford", payload: 2000, axles: 2 }
// COMPUTED — pre-calculate expensive aggregations
{
_id: "product-42", name: "Widget Pro",
reviewStats: { count: 287, avgRating: 4.3,
distribution: { 5: 142, 4: 89, 3: 31, 2: 15, 1: 10 } }
}6. Multi-Document Transactions
MongoDB provides ACID transactions across multiple documents and collections since version 4.0 (replica sets) and 4.2 (sharded clusters). While single-document operations are already atomic, transactions let you coordinate writes across multiple documents with all-or-nothing semantics — just like in a relational database. Keep transactions short (under 60 seconds by default) and avoid them when single-document atomicity suffices, as they add latency and lock overhead.
const session = client.startSession();
try {
session.startTransaction({
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" }
});
const accounts = client.db("bank").collection("accounts");
// Debit source
const debit = await accounts.updateOne(
{ _id: "acct-001", balance: { $gte: 500 } },
{ $inc: { balance: -500 } }, { session }
);
if (debit.modifiedCount === 0) throw new Error("Insufficient funds");
// Credit destination
await accounts.updateOne(
{ _id: "acct-002" },
{ $inc: { balance: 500 } }, { session }
);
// Record in ledger
await client.db("bank").collection("ledger").insertOne(
{ from: "acct-001", to: "acct-002", amount: 500, date: new Date() },
{ session }
);
await session.commitTransaction();
} catch (err) {
await session.abortTransaction();
} finally {
session.endSession();
}7. Change Streams
Change streams subscribe to real-time data changes — inserts, updates, deletes, and replacements — without polling. Built on the oplog, they work with replica sets and sharded clusters. Use them for real-time notifications, cache invalidation, data synchronization, and event-driven microservices. Each change event includes a resume token so you can restart the stream from where you left off after a disconnect.
const pipeline = [
{ $match: { "fullDocument.status": "urgent",
operationType: { $in: ["insert", "update"] } }}
];
const stream = db.collection("tickets").watch(pipeline, {
fullDocument: "updateLookup"
});
stream.on("change", (event) => {
console.log("Op:", event.operationType);
console.log("Doc:", event.fullDocument);
// Trigger notification, update cache, etc.
});
// Resume after disconnect
const token = loadTokenFromStorage();
const resumed = collection.watch(pipeline, { resumeAfter: token });8. Mongoose ODM Basics
Mongoose is the most popular MongoDB ODM for Node.js, providing schema validation, type casting, middleware hooks, virtual properties, and a fluent query builder on top of the native driver. It enforces structure at the application level while MongoDB itself stays schema-flexible. Use .lean() on queries when you only need plain objects (skips Mongoose hydration for better performance).
import mongoose from "mongoose";
await mongoose.connect("mongodb://localhost:27017/myapp", {
maxPoolSize: 10, serverSelectionTimeoutMS: 5000
});
const userSchema = new mongoose.Schema({
name: { type: String, required: true, trim: true },
email: { type: String, required: true, unique: true, lowercase: true },
age: { type: Number, min: 0, max: 150 },
role: { type: String, enum: ["user", "admin", "editor"], default: "user" },
tags: [String],
profile: { bio: String, avatar: String }
}, { timestamps: true });
// Virtual property
userSchema.virtual("isAdmin").get(function() {
return this.role === "admin";
});
// Pre-save middleware
userSchema.pre("save", function(next) {
if (this.isModified("email")) { /* validate */ }
next();
});
const User = mongoose.model("User", userSchema);
// CRUD with Mongoose
const user = await User.create({ name: "Alice", email: "alice@dev.com" });
const admins = await User.find({ role: "admin" }).sort({ name: 1 }).lean();
await User.findByIdAndUpdate(user._id, { $push: { tags: "verified" } });
// Population — resolve references
const postSchema = new mongoose.Schema({
title: String,
author: { type: mongoose.Schema.Types.ObjectId, ref: "User" }
});
const Post = mongoose.model("Post", postSchema);
const posts = await Post.find().populate("author", "name email").lean();9. Replica Sets and Sharding
Replica sets provide high availability with automatic failover. A typical set has three members: one primary (accepts writes) and two secondaries (replicate data asynchronously). If the primary fails, an election promotes a secondary within seconds. Sharding distributes data across multiple replica sets using a shard key. Choose your shard key carefully — it determines data distribution and query routing, and cannot easily be changed after sharding.
Replica Set Setup
// Connection string
"mongodb://mongo1:27017,mongo2:27017,mongo3:27017/mydb?replicaSet=rs0"
// Initiate replica set
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "mongo1:27017", priority: 2 },
{ _id: 1, host: "mongo2:27017", priority: 1 },
{ _id: 2, host: "mongo3:27017", priority: 1 }
]
});
db.getMongo().setReadPref("secondaryPreferred");Sharding
// Enable sharding on a database
sh.enableSharding("analytics");
// Shard a collection — choose shard key carefully!
sh.shardCollection("analytics.events", {
tenantId: 1, timestamp: 1
});
// Shard key strategies:
// Ranged: { timestamp: 1 } — good for range scans, hot shard risk
// Hashed: { userId: "hashed" } — even distribution, no range queries
// Compound: { region: 1, _id: 1 } — zone-based + high cardinality
// Good shard key properties:
// - High cardinality (many unique values)
// - Even distribution (no hot spots)
// - Query isolation (most queries target a single shard)
// - Cannot be changed after sharding!| Feature | Replica Set | Sharded Cluster |
|---|---|---|
| Purpose | HA, read scaling | Horizontal write/storage scaling |
| Data | Identical copies | Partitioned across shards |
| Failover | Automatic election | Per-shard automatic |
| When to use | Always (production baseline) | Data exceeds single server |
| Components | Primary + secondaries | Shards + config servers + mongos |
10. Performance Optimization
Performance tuning in MongoDB starts with understanding your query patterns. Use explain() to analyze query execution plans, the database profiler to discover slow queries, and connection pooling to manage driver resources efficiently. The most common performance issue is missing indexes — a single missing index can turn a 2ms query into a multi-second collection scan. Always check explain() output for COLLSCAN stages and high totalDocsExamined relative to nReturned.
explain() and Database Profiler
// Analyze query execution plan
db.orders.find({ status: "pending" }).explain("executionStats");
// Key fields in explain output:
// executionStats.nReturned — docs returned to client
// executionStats.totalDocsExamined — docs scanned (want close to nReturned)
// executionStats.totalKeysExamined — index keys scanned
// executionStats.executionTimeMillis — total query time
// winningPlan.stage — IXSCAN (good) vs COLLSCAN (bad)
// winningPlan.inputStage.indexName — which index was chosen
// Enable database profiler for slow queries (> 100ms)
db.setProfilingLevel(1, { slowms: 100 });
// Query the profiler output
db.system.profile.find({
millis: { $gt: 200 },
ns: "mydb.orders"
}).sort({ ts: -1 }).limit(5);
// Disable profiler when done
db.setProfilingLevel(0);Connection Pooling and Driver Configuration
const { MongoClient } = require("mongodb");
const client = new MongoClient(uri, {
maxPoolSize: 50, // max concurrent connections
minPoolSize: 5, // keep warm connections
maxIdleTimeMS: 60000, // close idle after 60s
serverSelectionTimeoutMS: 5000,
socketTimeoutMS: 45000,
compressors: ["zstd", "snappy"], // network compression
readPreference: "secondaryPreferred",
readConcern: { level: "majority" },
writeConcern: { w: "majority", j: true }
});
// IMPORTANT: Create client ONCE and reuse across requests
// Do NOT create a new MongoClient per request!Performance Best Practices Summary
| Practice | Why |
|---|---|
| Create compound indexes (ESR) | Avoid COLLSCAN, support queries efficiently |
| Project only needed fields | Reduce network transfer and memory |
| Use covered queries | Avoid document fetch entirely |
| Reuse MongoClient | Connection pooling, avoid TCP overhead |
| Enable compression (zstd) | 30-70% less network bandwidth |
| Use readPreference secondary | Distribute read load |
| Avoid unbounded arrays | Prevent 16MB limit and slow updates |
| Monitor with profiler | Identify slow queries proactively |
11. Atlas Features and MongoDB Compass
MongoDB Atlas is the fully managed cloud database service available on AWS, GCP, and Azure. It handles provisioning, patching, automated backups with point-in-time recovery, and auto-scaling. The free M0 tier (512 MB) is ideal for development and learning. MongoDB Compass is the official desktop GUI for visualizing schemas, building aggregation pipelines, managing indexes, and analyzing query performance with visual explain plans.
// Atlas connection (SRV format — auto-discovers replica set)
"mongodb+srv://user:pass@cluster0.abc123.mongodb.net/mydb"
// Key Atlas features:
// - Automated backups with point-in-time recovery
// - Auto-scaling storage and compute
// - Global clusters with zone-based sharding
// - Performance Advisor (automatic index recommendations)
// - Atlas Charts (built-in data visualization)
// - Atlas Data Federation (query S3 + Atlas together)
// Atlas Search — Lucene-based full-text via aggregation
db.products.aggregate([
{ $search: {
index: "product_search",
compound: {
must: [{ text: { query: "wireless headphones", path: "title" } }],
filter: [{ range: { path: "price", lte: 100 } }]
}
}},
{ $limit: 10 },
{ $project: { title: 1, price: 1, score: { $meta: "searchScore" } } }
]);
// MongoDB Compass features:
// - Visual schema analyzer
// - Aggregation pipeline builder (drag-and-drop stages)
// - Visual explain plan for query optimization
// - Index management and performance metrics
// - CRUD operations with a graphical interface12. Security: Authentication, Authorization & Encryption
Production MongoDB deployments require multiple security layers: authentication (who can connect), authorization (what they can do), encryption (protecting data in transit and at rest), and network controls (who can reach the server). Always enable authentication — by default, MongoDB allows unauthenticated connections which is a common source of data breaches.
// Create an admin user
use admin
db.createUser({
user: "appAdmin", pwd: passwordPrompt(),
roles: [
{ role: "readWrite", db: "myapp" },
{ role: "dbAdmin", db: "myapp" }
]
});
// Read-only user for reporting
db.createUser({
user: "reporter", pwd: passwordPrompt(),
roles: [{ role: "read", db: "myapp" }]
});
// Built-in roles:
// read, readWrite — data access
// dbAdmin, dbOwner — database admin
// clusterAdmin — cluster ops
// userAdminAnyDatabase — user management
// TLS/SSL connection:
// mongod --tlsMode requireTLS \
// --tlsCertificateKeyFile /etc/ssl/mongodb.pem \
// --tlsCAFile /etc/ssl/ca.pem
// Encryption at rest (Atlas or Enterprise):
// Atlas: AES-256 enabled by default
// Enterprise: KMIP or local key provider
// Client-Side Field Level Encryption (CSFLE)
// Encrypts fields BEFORE they leave the driver
// Even DBAs cannot read encrypted fields| Security Layer | Mechanism | Protects Against |
|---|---|---|
| Authentication | SCRAM-SHA-256, x.509, LDAP | Unauthorized connections |
| Authorization | Role-based access control | Privilege escalation |
| Encryption in transit | TLS/SSL | Network eavesdropping |
| Encryption at rest | AES-256, KMIP | Disk theft |
| Field-level encryption | CSFLE / Queryable Encryption | Admin seeing sensitive data |
| Network | IP whitelist, VPC peering | External attacks |
- MongoDB stores documents as BSON with rich types: ObjectId, Date, Decimal128, embedded docs, arrays
- CRUD uses
insertOne,find,updateOne,deleteOnewith operators like$gt,$in,$regex - The aggregation pipeline chains stages (
$match,$group,$lookup,$unwind) for analytics and joins - Follow the ESR rule for compound indexes: Equality, Sort, Range field order
- Embed for one-to-few read-together data; reference for one-to-many or independent access
- Multi-document transactions provide ACID — but prefer single-document atomicity when possible
- Change streams enable real-time event-driven architectures without polling
- Use
explain("executionStats")to verify index usage and avoid COLLSCAN - Replica sets for HA (production baseline); add sharding when data exceeds single-server capacity
- Always enable authentication, use TLS, apply RBAC, and consider CSFLE for sensitive fields