Kubernetes 存储最佳实践

Kubernetes 存储最佳实践

存储选型

根据应用特征选择存储类型

应用类型                推荐存储              原因
─────────────────────────────────────────────────────────
数据库 (MySQL/PostgreSQL)  本地 SSD/块存储      低延迟、高 IOPS
                          (Local PV/EBS io2)

NoSQL (MongoDB/Cassandra)  本地 SSD             低延迟、高吞吐
                          (Local PV)

缓存 (Redis/Memcached)    本地存储/内存盘       超低延迟
                          (Local PV/emptyDir)

消息队列 (Kafka)          本地 SSD/块存储       顺序写入优化
                          (Local PV/EBS gp3)

对象存储 (MinIO)          通用块存储            大容量、成本优化
                          (EBS gp3/st1)

文件共享 (NFS Server)     网络文件系统          多节点访问
                          (NFS/CephFS)

日志聚合 (Elasticsearch)  通用块存储            大容量、适中性能
                          (EBS gp3)

CI/CD 构建缓存            本地存储              临时数据、高速访问
                          (Local PV/hostPath)

静态资源 (图片/视频)       对象存储             大容量、低成本
                          (S3/OSS)

备份归档                  冷存储               长期保存、低成本
                          (S3 Glacier/HDD)

存储性能对比

# 高性能数据库:使用 Local PV + NVMe SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-nvme
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

---
# 高性能应用:使用云高性能块存储
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iops: "64000"        # 最高 IOPS
  throughput: "1000"   # 1 GB/s
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

---
# 通用应用:使用标准 SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

---
# 冷数据/归档:使用 HDD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cold-storage
provisioner: ebs.csi.aws.com
parameters:
  type: sc1  # Cold HDD
  encrypted: "false"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

容量规划

1. 评估存储需求

# 查看当前存储使用情况
kubectl get pvc --all-namespaces -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
SIZE:.spec.resources.requests.storage,\
USED:.status.capacity.storage,\
STORAGECLASS:.spec.storageClassName

# 统计各 StorageClass 使用量
kubectl get pv -o json | jq -r '.items[] | 
  [.spec.storageClassName, .spec.capacity.storage] | @tsv' | 
  awk '{sum[$1]+=$2} END {for (sc in sum) print sc, sum[sc]}'

# 预估增长率
# 根据历史数据预测未来 3-6 个月的存储需求

2. 设置资源配额

# 命名空间级别的存储配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: production
spec:
  hard:
    # 总存储容量限制
    requests.storage: "1Ti"
    
    # PVC 数量限制
    persistentvolumeclaims: "100"
    
    # 按 StorageClass 限制
    fast-ssd.storageclass.storage.k8s.io/requests.storage: "500Gi"
    fast-ssd.storageclass.storage.k8s.io/persistentvolumeclaims: "10"
    
    standard.storageclass.storage.k8s.io/requests.storage: "500Gi"
    standard.storageclass.storage.k8s.io/persistentvolumeclaims: "90"

---
# LimitRange:限制单个 PVC 大小
apiVersion: v1
kind: LimitRange
metadata:
  name: storage-limit
  namespace: production
spec:
  limits:
  - type: PersistentVolumeClaim
    max:
      storage: 100Gi  # 单个 PVC 最大容量
    min:
      storage: 1Gi    # 单个 PVC 最小容量

3. 预留存储空间

# 为关键应用预留存储
apiVersion: v1
kind: PersistentVolume
metadata:
  name: reserved-pv-database
  labels:
    app: database
    reserved: "true"
spec:
  capacity:
    storage: 500Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: high-performance
  local:
    path: /mnt/database-disk
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - database-node-1

---
# PVC 使用预留 PV
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  storageClassName: high-performance
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi
  selector:
    matchLabels:
      app: database
      reserved: "true"

数据持久化策略

1. StatefulSet 最佳实践

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      # 使用反亲和性分散 Pod
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - mysql
            topologyKey: kubernetes.io/hostname
      
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        
        # 资源限制
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        
        # 存储挂载
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        - name: config
          mountPath: /etc/mysql/conf.d
          readOnly: true
        
        # 存活探针
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
            - -h
            - localhost
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        
        # 就绪探针
        readinessProbe:
          exec:
            command:
            - mysql
            - -h
            - localhost
            - -e
            - "SELECT 1"
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 1
      
      volumes:
      - name: config
        configMap:
          name: mysql-config
  
  # VolumeClaimTemplates:为每个 Pod 创建独立的 PVC
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: mysql
    spec:
      storageClassName: fast-ssd
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi

2. 使用 Init Container 初始化数据

apiVersion: v1
kind: Pod
metadata:
  name: app-with-init-data
spec:
  # Init Container:从远程或其他源初始化数据
  initContainers:
  - name: init-data
    image: busybox
    command:
    - sh
    - -c
    - |
      # 检查数据目录是否为空
      if [ -z "$(ls -A /data)" ]; then
        echo "Initializing data..."
        # 从备份恢复或下载初始数据
        wget -O /data/init.sql https://example.com/init.sql
        # 或从另一个卷复制
        # cp -r /backup/* /data/
      else
        echo "Data already exists, skipping initialization"
      fi
    volumeMounts:
    - name: data
      mountPath: /data
  
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: data
      mountPath: /data
  
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: app-pvc

3. 数据预热

# 对于大型数据集,使用 Job 预热数据
apiVersion: batch/v1
kind: Job
metadata:
  name: data-warmup
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: warmup
        image: busybox
        command:
        - sh
        - -c
        - |
          # 读取所有文件,触发操作系统缓存
          find /data -type f -exec cat {} \; > /dev/null
          echo "Data warmup completed"
        volumeMounts:
        - name: data
          mountPath: /data
          readOnly: true
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: large-dataset-pvc

性能优化

1. 挂载选项优化

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: optimized-storage
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
mountOptions:
  # 禁用访问时间更新(减少写入)
  - noatime
  - nodiratime
  
  # 启用 TRIM(SSD 优化)
  - discard
  
  # 减少元数据更新(提高性能,但降低安全性)
  # - data=writeback
  
  # 增加 commit 间隔(减少同步频率)
  # - commit=60
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

2. 块设备 vs 文件系统

# 场景 1:数据库使用块设备(更高性能)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-block-pvc
spec:
  volumeMode: Block  # 块设备模式
  storageClassName: high-performance
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

---
apiVersion: v1
kind: Pod
metadata:
  name: database
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    volumeDevices:  # 使用 volumeDevices
    - name: data
      devicePath: /dev/xvda
    env:
    - name: MYSQL_DATADIR
      value: /dev/xvda  # 直接使用块设备
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: database-block-pvc

---
# 场景 2:应用程序使用文件系统(更灵活)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-filesystem-pvc
spec:
  volumeMode: Filesystem  # 文件系统模式(默认)
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

---
apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: nginx
    image: nginx
    volumeMounts:  # 使用 volumeMounts
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: app-filesystem-pvc

3. 本地存储性能优化

# 使用本地 SSD 获得最佳性能
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

---
# 手动创建本地 PV(在每个节点上)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv-node1-disk1
spec:
  capacity:
    storage: 1Ti
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-ssd
  local:
    path: /mnt/disks/ssd1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node-1

---
# 使用 local-volume-provisioner 自动发现
# https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-provisioner-config
  namespace: kube-system
data:
  storageClassMap: |
    local-ssd:
       hostDir: /mnt/disks
       mountDir: /mnt/disks
       blockCleanerCommand:
         - "/scripts/shred.sh"
         - "2"
       volumeMode: Filesystem

数据保护

1. 备份策略

# 使用 Velero 备份 PVC
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-schedule
  namespace: velero
data:
  # 每天凌晨 2 点全量备份
  daily-backup.yaml: |
    apiVersion: velero.io/v1
    kind: Schedule
    metadata:
      name: daily-backup
      namespace: velero
    spec:
      schedule: "0 2 * * *"
      template:
        includedNamespaces:
        - production
        - staging
        ttl: 720h  # 保留 30 天
        storageLocation: default
        volumeSnapshotLocations:
        - aws-snapshot

---
# 使用 VolumeSnapshot 创建快照
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-snapclass
driver: ebs.csi.aws.com
deletionPolicy: Retain  # 保留快照

---
# 创建快照
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mysql-snapshot
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: mysql-pvc

---
# 定期创建快照的 CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: snapshot-creator
spec:
  schedule: "0 */6 * * *"  # 每 6 小时
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: snapshot-creator
          containers:
          - name: create-snapshot
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              TIMESTAMP=$(date +%Y%m%d-%H%M%S)
              cat <<EOF | kubectl apply -f -
              apiVersion: snapshot.storage.k8s.io/v1
              kind: VolumeSnapshot
              metadata:
                name: mysql-snapshot-${TIMESTAMP}
              spec:
                volumeSnapshotClassName: csi-snapclass
                source:
                  persistentVolumeClaimName: mysql-pvc
              EOF
          restartPolicy: OnFailure

2. 数据恢复

# 从快照恢复 PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc-restored
spec:
  storageClassName: fast-ssd
  dataSource:
    name: mysql-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

---
# 从另一个 PVC 克隆
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc-clone
spec:
  storageClassName: fast-ssd
  dataSource:
    name: mysql-pvc
    kind: PersistentVolumeClaim
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

3. 灾难恢复测试

# 定期测试恢复流程的 Job
apiVersion: batch/v1
kind: CronJob
metadata:
  name: dr-test
spec:
  schedule: "0 3 * * 0"  # 每周日凌晨 3 点
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: dr-tester
          containers:
          - name: test-restore
            image: custom/dr-test:v1
            command:
            - /bin/sh
            - -c
            - |
              # 1. 创建测试快照
              # 2. 从快照恢复
              # 3. 验证数据完整性
              # 4. 清理测试资源
              # 5. 发送测试报告
              echo "DR test started"
              # ... 测试脚本 ...
              echo "DR test completed"
          restartPolicy: OnFailure

监控和告警

1. 存储使用率监控

# Prometheus 监控规则
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-storage-rules
  namespace: monitoring
data:
  storage-rules.yaml: |
    groups:
    - name: storage
      interval: 30s
      rules:
      
      # PVC 使用率
      - record: pvc:usage:ratio
        expr: |
          kubelet_volume_stats_used_bytes / 
          kubelet_volume_stats_capacity_bytes
      
      # PVC 使用率超过 80%
      - alert: PVCHighUsage
        expr: pvc:usage:ratio > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} 使用率超过 80%"
          description: "当前使用率: {{ $value | humanizePercentage }}"
      
      # PVC 使用率超过 90%
      - alert: PVCCriticalUsage
        expr: pvc:usage:ratio > 0.9
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} 使用率超过 90%"
          description: "当前使用率: {{ $value | humanizePercentage }},请立即扩容"
      
      # PVC IOPS 监控
      - alert: HighPVCIOPS
        expr: |
          rate(kubelet_volume_stats_iops_total[5m]) > 10000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} IOPS 过高"
      
      # PVC 延迟监控
      - alert: HighPVCLatency
        expr: |
          kubelet_volume_stats_io_time_seconds_total / 
          kubelet_volume_stats_io_operations_total > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} 延迟过高"
      
      # StorageClass 配额使用
      - alert: StorageClassQuotaExceeded
        expr: |
          sum by (storageclass) (
            kube_persistentvolumeclaim_resource_requests_storage_bytes
          ) / 
          sum by (storageclass) (
            kube_resourcequota_hard{resource="storageclass.storage.k8s.io/requests.storage"}
          ) > 0.9
        labels:
          severity: warning
        annotations:
          summary: "StorageClass {{ $labels.storageclass }} 配额使用率超过 90%"

2. Grafana Dashboard

{
  "dashboard": {
    "title": "Kubernetes 存储监控",
    "panels": [
      {
        "title": "PVC 使用率 Top 10",
        "targets": [{
          "expr": "topk(10, kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100)"
        }]
      },
      {
        "title": "PVC IOPS",
        "targets": [{
          "expr": "rate(kubelet_volume_stats_iops_total[5m])"
        }]
      },
      {
        "title": "PVC 吞吐量",
        "targets": [{
          "expr": "rate(kubelet_volume_stats_read_bytes_total[5m]) + rate(kubelet_volume_stats_write_bytes_total[5m])"
        }]
      },
      {
        "title": "PV 状态分布",
        "targets": [{
          "expr": "count by (phase) (kube_persistentvolume_status_phase)"
        }]
      }
    ]
  }
}

3. 自动扩容

# 使用 Operator 实现自动扩容
apiVersion: v1
kind: ConfigMap
metadata:
  name: auto-resize-config
data:
  config.yaml: |
    rules:
    - name: database-auto-resize
      # 当使用率超过 85% 时自动扩容 20%
      thresholds:
        usage: 0.85
      actions:
        resize:
          increase: 20%  # 增加 20%
          max: 1Ti       # 最大不超过 1Ti
      filters:
        labels:
          app: database
    
    - name: logs-auto-resize
      thresholds:
        usage: 0.90
      actions:
        resize:
          increase: 50Gi  # 固定增加 50Gi
          max: 500Gi
      filters:
        labels:
          app: logging

成本优化

1. 存储分层策略

# 热数据:高性能存储
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hot-tier
  labels:
    tier: hot
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

---
# 温数据:标准存储
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: warm-tier
  labels:
    tier: warm
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

---
# 冷数据:低成本存储
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cold-tier
  labels:
    tier: cold
provisioner: ebs.csi.aws.com
parameters:
  type: sc1  # Cold HDD
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

2. 清理未使用的 PVC

#!/bin/bash
# 查找未被使用的 PVC

echo "=== 未被任何 Pod 使用的 PVC ==="

for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
  for pvc in $(kubectl get pvc -n $ns -o jsonpath='{.items[*].metadata.name}'); do
    # 检查是否有 Pod 使用该 PVC
    if ! kubectl get pods -n $ns -o json | \
         jq -e ".items[].spec.volumes[]? | select(.persistentVolumeClaim.claimName==\"$pvc\")" > /dev/null; then
      echo "Namespace: $ns, PVC: $pvc"
    fi
  done
done

echo ""
echo "=== Released 状态的 PV (可以回收) ==="
kubectl get pv | grep Released

3. 存储成本报表

# 定期生成存储成本报表
apiVersion: batch/v1
kind: CronJob
metadata:
  name: storage-cost-report
spec:
  schedule: "0 0 * * 0"  # 每周日生成
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: report-generator
            image: custom/cost-reporter:v1
            command:
            - /bin/sh
            - -c
            - |
              # 收集存储使用数据
              # 计算成本(根据云提供商定价)
              # 生成报表
              # 发送邮件
              
              cat > /tmp/report.md <<EOF
              # 存储成本报表
              
              ## 总览
              - 总容量: $(kubectl get pv -o json | jq '[.items[].spec.capacity.storage] | add')
              - PV 数量: $(kubectl get pv --no-headers | wc -l)
              - PVC 数量: $(kubectl get pvc --all-namespaces --no-headers | wc -l)
              
              ## 按 StorageClass 分类
              $(kubectl get pv -o json | jq -r '.items[] | [.spec.storageClassName, .spec.capacity.storage] | @tsv')
              
              ## 按命名空间分类
              $(kubectl get pvc --all-namespaces -o json | jq -r '.items[] | [.metadata.namespace, .spec.resources.requests.storage] | @tsv')
              EOF
              
              # 发送报表
              # mail -s "Storage Cost Report" admin@example.com < /tmp/report.md
          restartPolicy: OnFailure

安全最佳实践

1. 加密存储

# 使用加密的 StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: encrypted-storage
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"  # 启用加密
  kmsKeyId: "arn:aws:kms:us-east-1:123456789:key/xxx"  # 使用自定义 KMS 密钥
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

2. 访问控制

# 限制哪些服务账号可以创建 PVC
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pvc-creator
  namespace: production
rules:
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["create", "get", "list", "watch"]
- apiGroups: [""]
  resources: ["persistentvolumeclaims/status"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pvc-creator-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: app-sa
  namespace: production
roleRef:
  kind: Role
  name: pvc-creator
  apiGroup: rbac.authorization.k8s.io

3. PVC 保护

# 防止 PVC 被意外删除
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: protected-pvc
  finalizers:
  - kubernetes.io/pvc-protection  # 自动添加
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

# 尝试删除时会被阻止,直到没有 Pod 使用

故障恢复

1. PV 丢失恢复

# 场景:PV 被误删,但底层存储仍存在

# 1. 查看 PVC 状态
kubectl get pvc
# STATUS: Lost

# 2. 找到底层存储卷 ID
kubectl describe pvc <pvc-name>
# 查看 volume.kubernetes.io/storage-provisioner annotation

# 3. 手动重建 PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: recovered-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-xxxxx  # 原始卷 ID
  claimRef:
    name: <pvc-name>
    namespace: <namespace>
EOF

# 4. 验证 PVC 重新绑定
kubectl get pvc

2. 数据迁移

# 在不停机的情况下迁移数据
apiVersion: v1
kind: Pod
metadata:
  name: data-migration
spec:
  containers:
  - name: rsync
    image: instrumentisto/rsync-ssh
    command:
    - /bin/sh
    - -c
    - |
      # 使用 rsync 增量同步
      while true; do
        rsync -av --delete /source/ /dest/
        sleep 300  # 每 5 分钟同步一次
      done
    volumeMounts:
    - name: source
      mountPath: /source
      readOnly: true
    - name: dest
      mountPath: /dest
  volumes:
  - name: source
    persistentVolumeClaim:
      claimName: old-pvc
  - name: dest
    persistentVolumeClaim:
      claimName: new-pvc

总结

Kubernetes 存储最佳实践要点:

选型原则

  • 根据应用特征选择存储类型
  • 平衡性能和成本
  • 考虑数据持久化需求

容量管理

  • 设置合理的配额和限制
  • 定期清理未使用的资源
  • 启用自动扩容

性能优化

  • 使用延迟绑定
  • 优化挂载选项
  • 考虑本地存储

数据保护

  • 定期备份
  • 测试恢复流程
  • 使用卷快照

监控告警

  • 监控使用率和性能
  • 设置告警阈值
  • 生成成本报表

安全加固

  • 启用存储加密
  • 实施访问控制
  • 保护关键 PVC

遵循这些最佳实践,可以构建稳定、高效、安全的存储系统。