数据结构与算法:开发者完全指南(含代码示例)
掌握数组、链表、树、图、哈希表、堆、栈、队列、Big O 表示法、排序与搜索算法,包含 JavaScript/TypeScript 和 Python 的实际代码示例。
- 数组 O(1) 随机访问,链表 O(1) 插入/删除——根据操作频率选择
- 哈希表是最常用的数据结构:平均 O(1) 查找、插入、删除
- 栈(LIFO)用于撤销/DFS/解析,队列(FIFO)用于BFS/调度/缓冲
- 二叉搜索树保持有序数据,支持 O(log n) 范围查询
- 堆实现优先队列:O(log n) 插入,O(1) 查看最小/最大值
- 图用邻接表或邻接矩阵表示,BFS 求最短路径,DFS 求连通分量
- 掌握 Big O:O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(2ⁿ)
- 没有"最好的"数据结构——只有最适合当前问题的数据结构
- Big O 分析帮助你在代码运行前预测性能瓶颈
- 排序是许多算法的基础:排序后可用二分搜索将 O(n) 降至 O(log n)
- 递归和迭代通常可以互相转换——栈是连接两者的桥梁
- 面试和生产代码都需要数据结构知识:选对结构,代码更快更简洁
- 空间和时间通常是权衡关系——哈希表用空间换时间就是经典例子
1. Big O 表示法:算法复杂度分析
Big O 表示法描述算法的时间或空间复杂度如何随输入规模增长。它关注的是增长趋势,忽略常数因子和低阶项。理解 Big O 是选择正确数据结构和算法的基础。
| 复杂度 | 名称 | 示例 | n=1000 |
|---|---|---|---|
| O(1) | 常数 | 数组索引访问 | 1 |
| O(log n) | 对数 | 二分搜索 | ~10 |
| O(n) | 线性 | 遍历数组 | 1,000 |
| O(n log n) | 线性对数 | 归并排序 | ~10,000 |
| O(n²) | 平方 | 冒泡排序 | 1,000,000 |
| O(2ⁿ) | 指数 | 递归斐波那契 | 天文数字 |
Big O 实际示例
// O(1) — Constant time: same speed regardless of input size
function getFirst(arr: number[]): number {
return arr[0]; // always one operation
}
// O(log n) — Logarithmic: halves the search space each step
function binarySearch(arr: number[], target: number): number {
let left = 0, right = arr.length - 1;
while (left <= right) {
const mid = Math.floor((left + right) / 2);
if (arr[mid] === target) return mid;
if (arr[mid] < target) left = mid + 1;
else right = mid - 1;
}
return -1;
}
// O(n) — Linear: visits every element once
function findMax(arr: number[]): number {
let max = arr[0];
for (let i = 1; i < arr.length; i++) {
if (arr[i] > max) max = arr[i];
}
return max;
}
// O(n²) — Quadratic: nested loops over the same data
function hasDuplicate(arr: number[]): boolean {
for (let i = 0; i < arr.length; i++) {
for (let j = i + 1; j < arr.length; j++) {
if (arr[i] === arr[j]) return true;
}
}
return false;
}
// O(n) with hash set — optimized duplicate check
function hasDuplicateOptimized(arr: number[]): boolean {
const seen = new Set<number>();
for (const num of arr) {
if (seen.has(num)) return true;
seen.add(num);
}
return false;
}2. 数组:最基础的数据结构
数组在连续内存中存储元素,提供 O(1) 的索引访问。它们是大多数编程语言中最常用的数据结构。JavaScript 的数组是动态的,Python 的 list 也是动态数组的实现。
| 操作 | 时间复杂度 | 说明 |
|---|---|---|
| 按索引访问 | O(1) | 直接计算内存地址 |
| 搜索(无序) | O(n) | 线性扫描 |
| 搜索(有序) | O(log n) | 二分搜索 |
| 末尾插入 | 均摊 O(1) | 可能触发扩容 |
| 中间插入 | O(n) | 需要移动元素 |
| 删除 | O(n) | 需要移动元素填补空位 |
常用数组技巧
// Two-pointer technique — find pair that sums to target
function twoSum(sorted: number[], target: number): [number, number] | null {
let left = 0, right = sorted.length - 1;
while (left < right) {
const sum = sorted[left] + sorted[right];
if (sum === target) return [left, right];
if (sum < target) left++;
else right--;
}
return null;
}
// Sliding window — max sum subarray of size k
function maxSumSubarray(arr: number[], k: number): number {
let windowSum = 0;
for (let i = 0; i < k; i++) windowSum += arr[i];
let maxSum = windowSum;
for (let i = k; i < arr.length; i++) {
windowSum += arr[i] - arr[i - k]; // slide the window
maxSum = Math.max(maxSum, windowSum);
}
return maxSum;
}
// Prefix sum — range sum queries in O(1)
function buildPrefixSum(arr: number[]): number[] {
const prefix = [0];
for (let i = 0; i < arr.length; i++) {
prefix.push(prefix[i] + arr[i]);
}
return prefix;
}
function rangeSum(prefix: number[], left: number, right: number): number {
return prefix[right + 1] - prefix[left];
}
// Example usage:
const arr = [1, 3, 5, 2, 8, 4];
const prefix = buildPrefixSum(arr);
console.log(rangeSum(prefix, 1, 3)); // 3+5+2 = 103. 链表:动态数据的灵活存储
链表由节点组成,每个节点包含数据和指向下一个节点的指针。单链表只能向前遍历,双链表可以双向遍历。链表的优势在于 O(1) 的插入和删除(已知位置),但访问需要 O(n) 遍历。
// Singly Linked List implementation
class ListNode<T> {
val: T;
next: ListNode<T> | null = null;
constructor(val: T) { this.val = val; }
}
class SinglyLinkedList<T> {
head: ListNode<T> | null = null;
size: number = 0;
// O(1) — prepend to head
prepend(val: T): void {
const node = new ListNode(val);
node.next = this.head;
this.head = node;
this.size++;
}
// O(n) — append to tail
append(val: T): void {
const node = new ListNode(val);
if (!this.head) { this.head = node; }
else {
let curr = this.head;
while (curr.next) curr = curr.next;
curr.next = node;
}
this.size++;
}
// O(n) — delete first occurrence
delete(val: T): boolean {
if (!this.head) return false;
if (this.head.val === val) {
this.head = this.head.next;
this.size--;
return true;
}
let curr = this.head;
while (curr.next) {
if (curr.next.val === val) {
curr.next = curr.next.next;
this.size--;
return true;
}
curr = curr.next;
}
return false;
}
// O(n) — convert to array for display
toArray(): T[] {
const result: T[] = [];
let curr = this.head;
while (curr) {
result.push(curr.val);
curr = curr.next;
}
return result;
}
}
// Classic interview problem: reverse a linked list
function reverseList<T>(head: ListNode<T> | null): ListNode<T> | null {
let prev: ListNode<T> | null = null;
let curr = head;
while (curr) {
const next = curr.next;
curr.next = prev;
prev = curr;
curr = next;
}
return prev;
}
// Detect cycle with Floyd's algorithm (fast/slow pointers)
function hasCycle<T>(head: ListNode<T> | null): boolean {
let slow = head, fast = head;
while (fast && fast.next) {
slow = slow!.next;
fast = fast.next.next;
if (slow === fast) return true;
}
return false;
}4. 栈与队列:受限但强大的线性结构
栈和队列是带有访问限制的线性数据结构。栈遵循 LIFO(后进先出),队列遵循 FIFO(先进先出)。这些限制反而使它们在特定场景中极其高效和有用。
栈的实现与应用
// Stack implementation
class Stack<T> {
private items: T[] = [];
push(item: T): void { this.items.push(item); } // O(1)
pop(): T | undefined { return this.items.pop(); } // O(1)
peek(): T | undefined { return this.items[this.items.length - 1]; }
isEmpty(): boolean { return this.items.length === 0; }
get size(): number { return this.items.length; }
}
// Application: validate balanced parentheses
function isValidParentheses(s: string): boolean {
const stack = new Stack<string>();
const pairs: Record<string, string> = {
")": "(", "]": "[", "}": "{"
};
for (const char of s) {
if ("([{".includes(char)) {
stack.push(char);
} else if (pairs[char]) {
if (stack.isEmpty() || stack.pop() !== pairs[char]) {
return false;
}
}
}
return stack.isEmpty();
}
console.log(isValidParentheses("({[]})")); // true
console.log(isValidParentheses("({[}])")); // false
// Application: evaluate Reverse Polish Notation (postfix)
function evalRPN(tokens: string[]): number {
const stack = new Stack<number>();
for (const token of tokens) {
if ("+-*/".includes(token)) {
const b = stack.pop()!, a = stack.pop()!;
switch (token) {
case "+": stack.push(a + b); break;
case "-": stack.push(a - b); break;
case "*": stack.push(a * b); break;
case "/": stack.push(Math.trunc(a / b)); break;
}
} else {
stack.push(Number(token));
}
}
return stack.pop()!;
}
// "3 4 + 2 *" = (3 + 4) * 2 = 14
console.log(evalRPN(["3", "4", "+", "2", "*"])); // 14队列的实现与应用
// Queue implementation (using linked list for O(1) dequeue)
class Queue<T> {
private head: ListNode<T> | null = null;
private tail: ListNode<T> | null = null;
private _size = 0;
enqueue(val: T): void {
const node = new ListNode(val);
if (this.tail) { this.tail.next = node; }
else { this.head = node; }
this.tail = node;
this._size++;
}
dequeue(): T | undefined {
if (!this.head) return undefined;
const val = this.head.val;
this.head = this.head.next;
if (!this.head) this.tail = null;
this._size--;
return val;
}
peek(): T | undefined { return this.head?.val; }
isEmpty(): boolean { return this._size === 0; }
get size(): number { return this._size; }
}
// Application: recent counter (sliding window with queue)
class RecentCounter {
private queue = new Queue<number>();
ping(t: number): number {
this.queue.enqueue(t);
while (this.queue.peek()! < t - 3000) {
this.queue.dequeue();
}
return this.queue.size;
}
}5. 哈希表:最实用的数据结构
哈希表通过哈希函数将键映射到桶(数组索引),实现平均 O(1) 的查找、插入和删除。JavaScript 中的 Map/Object 和 Python 中的 dict 都是哈希表实现。冲突处理是哈希表设计的关键——常见方法有链地址法和开放地址法。
// Hash table implementation with chaining
class HashTable<K, V> {
private buckets: Array<Array<[K, V]>>;
private count = 0;
constructor(private capacity = 16) {
this.buckets = new Array(capacity).fill(null).map(() => []);
}
private hash(key: K): number {
const str = String(key);
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = (hash * 31 + str.charCodeAt(i)) | 0;
}
return Math.abs(hash) % this.capacity;
}
set(key: K, value: V): void {
const idx = this.hash(key);
const bucket = this.buckets[idx];
const existing = bucket.findIndex(([k]) => k === key);
if (existing >= 0) {
bucket[existing] = [key, value]; // update
} else {
bucket.push([key, value]);
this.count++;
if (this.count / this.capacity > 0.75) this.resize();
}
}
get(key: K): V | undefined {
const idx = this.hash(key);
const pair = this.buckets[idx].find(([k]) => k === key);
return pair ? pair[1] : undefined;
}
delete(key: K): boolean {
const idx = this.hash(key);
const bucket = this.buckets[idx];
const i = bucket.findIndex(([k]) => k === key);
if (i === -1) return false;
bucket.splice(i, 1);
this.count--;
return true;
}
private resize(): void {
const old = this.buckets;
this.capacity *= 2;
this.buckets = new Array(this.capacity).fill(null).map(() => []);
this.count = 0;
for (const bucket of old) {
for (const [k, v] of bucket) this.set(k, v);
}
}
}
// Common hash table patterns
// 1. Frequency counter
function charFrequency(s: string): Map<string, number> {
const freq = new Map<string, number>();
for (const ch of s) {
freq.set(ch, (freq.get(ch) || 0) + 1);
}
return freq;
}
// 2. Two-sum using hash map — O(n) vs O(n²) brute force
function twoSumHash(nums: number[], target: number): [number, number] | null {
const map = new Map<number, number>(); // value -> index
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (map.has(complement)) {
return [map.get(complement)!, i];
}
map.set(nums[i], i);
}
return null;
}
// 3. Group anagrams
function groupAnagrams(strs: string[]): string[][] {
const map = new Map<string, string[]>();
for (const s of strs) {
const key = s.split("").sort().join("");
if (!map.has(key)) map.set(key, []);
map.get(key)!.push(s);
}
return Array.from(map.values());
}6. 树:层次化数据的自然表示
树是一种非线性数据结构,由节点通过边连接,有一个根节点,每个节点最多有一个父节点。二叉搜索树(BST)是最常用的树结构,它的每个节点的左子树值都小于该节点,右子树值都大于该节点,支持 O(log n) 的搜索、插入和删除(平衡时)。
二叉搜索树实现
class TreeNode<T> {
val: T;
left: TreeNode<T> | null = null;
right: TreeNode<T> | null = null;
constructor(val: T) { this.val = val; }
}
class BST<T> {
root: TreeNode<T> | null = null;
// O(log n) average, O(n) worst (unbalanced)
insert(val: T): void {
const node = new TreeNode(val);
if (!this.root) { this.root = node; return; }
let curr = this.root;
while (true) {
if (val < curr.val) {
if (!curr.left) { curr.left = node; return; }
curr = curr.left;
} else {
if (!curr.right) { curr.right = node; return; }
curr = curr.right;
}
}
}
// O(log n) average
search(val: T): boolean {
let curr = this.root;
while (curr) {
if (val === curr.val) return true;
curr = val < curr.val ? curr.left : curr.right;
}
return false;
}
// In-order traversal: left -> root -> right (sorted output)
inOrder(node: TreeNode<T> | null = this.root): T[] {
if (!node) return [];
return [
...this.inOrder(node.left),
node.val,
...this.inOrder(node.right)
];
}
// Pre-order: root -> left -> right (copy/serialize)
preOrder(node: TreeNode<T> | null = this.root): T[] {
if (!node) return [];
return [
node.val,
...this.preOrder(node.left),
...this.preOrder(node.right)
];
}
// Post-order: left -> right -> root (delete/free)
postOrder(node: TreeNode<T> | null = this.root): T[] {
if (!node) return [];
return [
...this.postOrder(node.left),
...this.postOrder(node.right),
node.val
];
}
}
// Level-order traversal (BFS) using a queue
function levelOrder<T>(root: TreeNode<T> | null): T[][] {
if (!root) return [];
const result: T[][] = [];
const queue: TreeNode<T>[] = [root];
while (queue.length > 0) {
const levelSize = queue.length;
const level: T[] = [];
for (let i = 0; i < levelSize; i++) {
const node = queue.shift()!;
level.push(node.val);
if (node.left) queue.push(node.left);
if (node.right) queue.push(node.right);
}
result.push(level);
}
return result;
}
// Maximum depth of binary tree (DFS)
function maxDepth<T>(root: TreeNode<T> | null): number {
if (!root) return 0;
return 1 + Math.max(maxDepth(root.left), maxDepth(root.right));
}7. 堆与优先队列
堆是一种特殊的完全二叉树,满足堆属性:最小堆中父节点总是小于或等于子节点,最大堆则相反。堆通常用数组实现,是优先队列的标准底层结构。它在 Dijkstra 最短路径、中位数维护、Top-K 问题中至关重要。
// Min-Heap implementation (array-based)
class MinHeap {
private heap: number[] = [];
private parent(i: number) { return Math.floor((i - 1) / 2); }
private leftChild(i: number) { return 2 * i + 1; }
private rightChild(i: number) { return 2 * i + 2; }
private swap(i: number, j: number) {
[this.heap[i], this.heap[j]] = [this.heap[j], this.heap[i]];
}
// O(log n) — insert and bubble up
insert(val: number): void {
this.heap.push(val);
let i = this.heap.length - 1;
while (i > 0 && this.heap[i] < this.heap[this.parent(i)]) {
this.swap(i, this.parent(i));
i = this.parent(i);
}
}
// O(log n) — extract min and bubble down
extractMin(): number | undefined {
if (this.heap.length === 0) return undefined;
const min = this.heap[0];
const last = this.heap.pop()!;
if (this.heap.length > 0) {
this.heap[0] = last;
this.bubbleDown(0);
}
return min;
}
private bubbleDown(i: number): void {
const n = this.heap.length;
while (true) {
let smallest = i;
const left = this.leftChild(i);
const right = this.rightChild(i);
if (left < n && this.heap[left] < this.heap[smallest]) smallest = left;
if (right < n && this.heap[right] < this.heap[smallest]) smallest = right;
if (smallest === i) break;
this.swap(i, smallest);
i = smallest;
}
}
peek(): number | undefined { return this.heap[0]; } // O(1)
get size(): number { return this.heap.length; }
}
// Application: find K largest elements
function kLargest(arr: number[], k: number): number[] {
const minHeap = new MinHeap();
for (const num of arr) {
minHeap.insert(num);
if (minHeap.size > k) minHeap.extractMin();
}
const result: number[] = [];
while (minHeap.size > 0) result.push(minHeap.extractMin()!);
return result;
}
console.log(kLargest([3, 1, 5, 12, 2, 11], 3)); // [5, 11, 12]8. 图:关系与网络建模
图由顶点(节点)和边组成,用于表示实体之间的关系。社交网络、路线规划、依赖管理都是图的应用。图可以是有向或无向的,加权或无权的。两种主要表示方法是邻接表(稀疏图高效)和邻接矩阵(密集图高效)。
图的实现与遍历
// Adjacency List representation
class Graph {
private adjacencyList = new Map<string, string[]>();
addVertex(v: string): void {
if (!this.adjacencyList.has(v)) {
this.adjacencyList.set(v, []);
}
}
addEdge(v1: string, v2: string): void {
this.addVertex(v1);
this.addVertex(v2);
this.adjacencyList.get(v1)!.push(v2);
this.adjacencyList.get(v2)!.push(v1); // undirected
}
// BFS — finds shortest path in unweighted graph
bfs(start: string): string[] {
const visited = new Set<string>();
const queue: string[] = [start];
const result: string[] = [];
visited.add(start);
while (queue.length > 0) {
const vertex = queue.shift()!;
result.push(vertex);
for (const neighbor of this.adjacencyList.get(vertex) || []) {
if (!visited.has(neighbor)) {
visited.add(neighbor);
queue.push(neighbor);
}
}
}
return result;
}
// DFS — recursive
dfs(start: string): string[] {
const visited = new Set<string>();
const result: string[] = [];
const traverse = (vertex: string) => {
visited.add(vertex);
result.push(vertex);
for (const neighbor of this.adjacencyList.get(vertex) || []) {
if (!visited.has(neighbor)) traverse(neighbor);
}
};
traverse(start);
return result;
}
// DFS — iterative with explicit stack
dfsIterative(start: string): string[] {
const visited = new Set<string>();
const stack: string[] = [start];
const result: string[] = [];
while (stack.length > 0) {
const vertex = stack.pop()!;
if (visited.has(vertex)) continue;
visited.add(vertex);
result.push(vertex);
for (const neighbor of this.adjacencyList.get(vertex) || []) {
if (!visited.has(neighbor)) stack.push(neighbor);
}
}
return result;
}
// Shortest path (BFS) between two nodes
shortestPath(start: string, end: string): string[] | null {
const visited = new Set<string>();
const queue: string[][] = [[start]];
visited.add(start);
while (queue.length > 0) {
const path = queue.shift()!;
const vertex = path[path.length - 1];
if (vertex === end) return path;
for (const neighbor of this.adjacencyList.get(vertex) || []) {
if (!visited.has(neighbor)) {
visited.add(neighbor);
queue.push([...path, neighbor]);
}
}
}
return null;
}
}
// Usage
const g = new Graph();
g.addEdge("A", "B");
g.addEdge("A", "C");
g.addEdge("B", "D");
g.addEdge("C", "D");
g.addEdge("D", "E");
console.log(g.bfs("A")); // [A, B, C, D, E]
console.log(g.shortestPath("A", "E")); // [A, B, D, E]拓扑排序(有向无环图)
// Topological sort using Kahn's algorithm (BFS)
// Used for: build systems, task scheduling, course prerequisites
function topologicalSort(
numNodes: number,
edges: [number, number][]
): number[] {
const adjList: number[][] = Array.from({ length: numNodes }, () => []);
const inDegree = new Array(numNodes).fill(0);
for (const [from, to] of edges) {
adjList[from].push(to);
inDegree[to]++;
}
// Start with nodes that have no incoming edges
const queue: number[] = [];
for (let i = 0; i < numNodes; i++) {
if (inDegree[i] === 0) queue.push(i);
}
const result: number[] = [];
while (queue.length > 0) {
const node = queue.shift()!;
result.push(node);
for (const neighbor of adjList[node]) {
inDegree[neighbor]--;
if (inDegree[neighbor] === 0) queue.push(neighbor);
}
}
// If result doesn't include all nodes, there's a cycle
if (result.length !== numNodes) {
throw new Error("Graph has a cycle — no valid topological order");
}
return result;
}
// Example: course prerequisites
// 0=Math, 1=Physics, 2=CS101, 3=CS201, 4=ML
const order = topologicalSort(5, [
[0, 1], // Math -> Physics
[0, 2], // Math -> CS101
[2, 3], // CS101 -> CS201
[1, 4], // Physics -> ML
[3, 4], // CS201 -> ML
]);
console.log(order); // [0, 1, 2, 3, 4] or [0, 2, 1, 3, 4]9. 排序算法:从基础到高级
排序是最基本的算法操作之一。理解不同排序算法的特性——时间复杂度、空间复杂度、稳定性——有助于根据场景选择最佳方案。大多数语言内置的排序使用 Timsort(归并排序和插入排序的混合),它在实际数据上表现优异。
| 算法 | 最优 | 平均 | 最差 | 空间 | 稳定 |
|---|---|---|---|---|---|
| Bubble Sort | O(n) | O(n²) | O(n²) | O(1) | 是 |
| Insertion Sort | O(n) | O(n²) | O(n²) | O(1) | 是 |
| Selection Sort | O(n²) | O(n²) | O(n²) | O(1) | 否 |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | 是 |
| Quick Sort | O(n log n) | O(n log n) | O(n²) | O(log n) | 否 |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | 否 |
| Counting Sort | O(n+k) | O(n+k) | O(n+k) | O(k) | 是 |
归并排序(分治法)
// Merge Sort — O(n log n) guaranteed, stable, O(n) space
function mergeSort(arr: number[]): number[] {
if (arr.length <= 1) return arr;
const mid = Math.floor(arr.length / 2);
const left = mergeSort(arr.slice(0, mid));
const right = mergeSort(arr.slice(mid));
return merge(left, right);
}
function merge(left: number[], right: number[]): number[] {
const result: number[] = [];
let i = 0, j = 0;
while (i < left.length && j < right.length) {
if (left[i] <= right[j]) {
result.push(left[i++]);
} else {
result.push(right[j++]);
}
}
return result.concat(left.slice(i)).concat(right.slice(j));
}
console.log(mergeSort([38, 27, 43, 3, 9, 82, 10]));
// [3, 9, 10, 27, 38, 43, 82]快速排序(原地排序)
// Quick Sort — O(n log n) average, O(n²) worst, in-place
function quickSort(
arr: number[],
low = 0,
high = arr.length - 1
): number[] {
if (low < high) {
const pivotIdx = partition(arr, low, high);
quickSort(arr, low, pivotIdx - 1);
quickSort(arr, pivotIdx + 1, high);
}
return arr;
}
function partition(
arr: number[],
low: number,
high: number
): number {
// Median-of-three pivot selection for better performance
const mid = Math.floor((low + high) / 2);
if (arr[mid] < arr[low]) [arr[low], arr[mid]] = [arr[mid], arr[low]];
if (arr[high] < arr[low]) [arr[low], arr[high]] = [arr[high], arr[low]];
if (arr[mid] < arr[high]) [arr[mid], arr[high]] = [arr[high], arr[mid]];
const pivot = arr[high];
let i = low - 1;
for (let j = low; j < high; j++) {
if (arr[j] <= pivot) {
i++;
[arr[i], arr[j]] = [arr[j], arr[i]];
}
}
[arr[i + 1], arr[high]] = [arr[high], arr[i + 1]];
return i + 1;
}
const data = [10, 7, 8, 9, 1, 5];
console.log(quickSort([...data])); // [1, 5, 7, 8, 9, 10]10. 搜索算法:高效查找数据
搜索是编程中最常见的操作。线性搜索适用于无序数据(O(n)),二分搜索适用于有序数据(O(log n))。掌握二分搜索的变体——查找左边界、右边界、旋转数组搜索——是技术面试的核心技能。
二分搜索变体
// Standard binary search
function binarySearch(arr: number[], target: number): number {
let left = 0, right = arr.length - 1;
while (left <= right) {
const mid = left + Math.floor((right - left) / 2); // avoid overflow
if (arr[mid] === target) return mid;
if (arr[mid] < target) left = mid + 1;
else right = mid - 1;
}
return -1;
}
// Find leftmost (first) occurrence
function lowerBound(arr: number[], target: number): number {
let left = 0, right = arr.length;
while (left < right) {
const mid = left + Math.floor((right - left) / 2);
if (arr[mid] < target) left = mid + 1;
else right = mid;
}
return left; // first index where arr[i] >= target
}
// Find rightmost (last) occurrence
function upperBound(arr: number[], target: number): number {
let left = 0, right = arr.length;
while (left < right) {
const mid = left + Math.floor((right - left) / 2);
if (arr[mid] <= target) left = mid + 1;
else right = mid;
}
return left - 1; // last index where arr[i] <= target
}
// Search in rotated sorted array
function searchRotated(arr: number[], target: number): number {
let left = 0, right = arr.length - 1;
while (left <= right) {
const mid = left + Math.floor((right - left) / 2);
if (arr[mid] === target) return mid;
// Left half is sorted
if (arr[left] <= arr[mid]) {
if (arr[left] <= target && target < arr[mid]) {
right = mid - 1;
} else {
left = mid + 1;
}
}
// Right half is sorted
else {
if (arr[mid] < target && target <= arr[right]) {
left = mid + 1;
} else {
right = mid - 1;
}
}
}
return -1;
}
// Example
console.log(searchRotated([4, 5, 6, 7, 0, 1, 2], 0)); // 4
console.log(searchRotated([4, 5, 6, 7, 0, 1, 2], 3)); // -111. Python 中的数据结构实战
Python 内置了丰富的数据结构支持。list 是动态数组,dict 是哈希表,set 是哈希集合,collections 模块提供 deque(双端队列)、Counter(频率计数器)、defaultdict(默认字典)等高级结构,heapq 模块提供堆操作。
# Python data structure essentials
from collections import deque, Counter, defaultdict
import heapq
# ---- list (dynamic array) ----
arr = [3, 1, 4, 1, 5, 9]
arr.sort() # in-place sort: [1, 1, 3, 4, 5, 9]
sorted_arr = sorted(arr) # returns new sorted list
arr.append(2) # O(1) amortized
arr.pop() # O(1) remove last
arr.insert(0, 99) # O(n) insert at index
# List comprehension (Pythonic way)
squares = [x**2 for x in range(10)] # [0, 1, 4, 9, ..., 81]
evens = [x for x in arr if x % 2 == 0] # filter even numbers
# ---- dict (hash table) ----
graph = {"A": ["B", "C"], "B": ["D"], "C": ["D"]}
graph.get("X", []) # safe access with default
# ---- set (hash set) ----
seen = set()
seen.add(1) # O(1)
print(1 in seen) # O(1) membership test: True
# ---- deque (double-ended queue) ----
q = deque([1, 2, 3])
q.append(4) # O(1) right
q.appendleft(0) # O(1) left
q.pop() # O(1) right
q.popleft() # O(1) left — unlike list.pop(0) which is O(n)
# ---- Counter (frequency counting) ----
text = "hello world"
freq = Counter(text)
print(freq.most_common(3)) # [('l', 3), ('o', 2), ('h', 1)]
# ---- defaultdict ----
adj = defaultdict(list)
adj["A"].append("B") # no KeyError if key missing
adj["A"].append("C")
# ---- heapq (min-heap) ----
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
print(heapq.heappop(heap)) # 1 (min element)
# Top-K largest
nums = [3, 1, 5, 12, 2, 11]
print(heapq.nlargest(3, nums)) # [12, 11, 5]
print(heapq.nsmallest(3, nums)) # [1, 2, 3]
# ---- BFS in Python ----
def bfs(graph, start):
visited = set()
queue = deque([start])
visited.add(start)
result = []
while queue:
vertex = queue.popleft()
result.append(vertex)
for neighbor in graph.get(vertex, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return result12. 如何选择正确的数据结构
选择数据结构时,要考虑你最频繁的操作、数据量大小、是否需要排序、空间限制等因素。下面的对比表可以帮助你快速做出决策。
| 需求 | 推荐结构 | 原因 |
|---|---|---|
| 快速查找(键→值) | Hash Table | O(1) 平均查找 |
| 有序数据 + 范围查询 | BST / TreeMap | O(log n) 有序操作 |
| FIFO 处理 | Queue / Deque | O(1) 两端操作 |
| LIFO / 撤销 | Stack | O(1) push/pop |
| 优先级处理 | Heap | O(log n) 插入/提取 |
| 去重 | Set | O(1) 成员测试 |
| 关系/网络建模 | Graph | 灵活表示实体关系 |
| 频繁中间插入/删除 | Linked List | O(1) 已知位置操作 |
数据结构复杂度速查表
| 数据结构 | 访问 | 搜索 | 插入 | 删除 | 空间 |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) | O(n) |
| Linked List | O(n) | O(n) | O(1)* | O(1)* | O(n) |
| Hash Table | N/A | O(1) | O(1) | O(1) | O(n) |
| BST (平衡) | O(log n) | O(log n) | O(log n) | O(log n) | O(n) |
| Heap | O(n) | O(n) | O(log n) | O(log n) | O(n) |
| Stack | O(n) | O(n) | O(1) | O(1) | O(n) |
| Queue | O(n) | O(n) | O(1) | O(1) | O(n) |
* 链表的 O(1) 插入/删除需要已知节点位置;查找位置需 O(n)。所有哈希表复杂度为平均情况。
总结
数据结构和算法是每个开发者的核心技能。数组和哈希表覆盖了大部分日常需求,栈和队列解决特定的访问模式问题,树和图处理层次化和网络关系数据。掌握 Big O 分析让你能在编码前预判性能,选择排序和搜索算法时考虑数据特征。记住:没有万能的数据结构,只有最适合当前问题的数据结构。
常见问题
What is the difference between an array and a linked list?
Arrays store elements contiguously in memory, providing O(1) random access by index but O(n) insertion/deletion in the middle. Linked lists store elements as nodes with pointers, offering O(1) insertion/deletion at known positions but O(n) access since you must traverse from the head. Use arrays when you need fast lookups and linked lists when you need frequent insertions and deletions.
What is Big O notation and why does it matter?
Big O notation describes the upper bound of an algorithm's time or space complexity as input size grows. It matters because it lets you predict how an algorithm scales — O(1) is constant time, O(log n) is logarithmic, O(n) is linear, O(n log n) is linearithmic, and O(n^2) is quadratic. Choosing the right algorithm can mean the difference between milliseconds and hours for large datasets.
When should I use a hash table versus a binary search tree?
Hash tables (Map/Object in JS, dict in Python) provide average O(1) lookup, insert, and delete, making them ideal for fast key-value access. Binary search trees maintain sorted order and support O(log n) range queries, finding min/max, and in-order traversal. Use hash tables for fast lookups when order does not matter, and BSTs when you need sorted data or range queries.
What is the best sorting algorithm for general use?
For general-purpose sorting, merge sort and quicksort are the most common choices with O(n log n) average time complexity. Merge sort guarantees O(n log n) worst-case and is stable but uses O(n) extra space. Quicksort is typically faster in practice due to cache locality but has O(n^2) worst-case. Most language standard libraries use Timsort (a hybrid of merge sort and insertion sort) which is stable and optimized for real-world data.
How do stacks and queues differ and when should I use each?
A stack follows LIFO (Last In, First Out) — the last element added is the first removed, like a stack of plates. A queue follows FIFO (First In, First Out) — the first element added is the first removed, like a line at a store. Use stacks for undo operations, expression parsing, DFS traversal, and function call management. Use queues for BFS traversal, task scheduling, and buffering.
What is a heap and how is it used in priority queues?
A heap is a complete binary tree where each parent node satisfies the heap property — in a min-heap, every parent is smaller than its children; in a max-heap, every parent is larger. Heaps are the standard implementation for priority queues, supporting O(log n) insert and O(log n) extract-min/max, with O(1) peek at the min/max element. They are used in Dijkstra's algorithm, median finding, and task scheduling.
What is the difference between BFS and DFS for graph traversal?
BFS (Breadth-First Search) explores all neighbors at the current depth before moving deeper, using a queue. It finds the shortest path in unweighted graphs. DFS (Depth-First Search) explores as far as possible along each branch before backtracking, using a stack or recursion. DFS uses less memory for deep graphs and is useful for topological sorting, cycle detection, and finding connected components.
How do I choose the right data structure for my problem?
Consider these factors: (1) What operations do you need most — lookups, insertions, deletions, or ordered traversal? (2) What are the time complexity requirements? (3) How much memory can you use? For fast lookups use hash tables; for sorted data use BSTs or sorted arrays with binary search; for FIFO processing use queues; for LIFO use stacks; for priority-based processing use heaps; for relationships between entities use graphs.