# 📚 透過 OpenTelemetry Operator 學習 Kubernetes Operator 開發指南

# 📚 透過 OpenTelemetry Operator 深度學習 Kubernetes Operator 開發

> **本教程基於生產級專案** [**OpenTelemetry Operator**](https://github.com/open-telemetry/opentelemetry-operator) **的實際代碼**
> 
> 涵蓋從基礎概念到高級實戰的完整學習路徑

---

## 目錄

1. [Kubernetes Operator 核心概念](#%E4%B8%80kubernetes-operator-%E6%A0%B8%E5%BF%83%E6%A6%82%E5%BF%B5)
    
2. [OpenTelemetry Operator 架構深度解析](#%E4%BA%8Copentelemetry-operator-%E6%9E%B6%E6%A7%8B%E6%B7%B1%E5%BA%A6%E8%A7%A3%E6%9E%90)
    
3. [CRD 完整剖析與實戰](#%E4%B8%89crd-%E5%AE%8C%E6%95%B4%E5%89%96%E6%9E%90%E8%88%87%E5%AF%A6%E6%88%B0)
    
4. [Controller/Reconciler 深度實現](#%E5%9B%9Bcontrollerreconciler-%E6%B7%B1%E5%BA%A6%E5%AF%A6%E7%8F%BE)
    
5. [Manifest 構建器詳解](#%E4%BA%94manifest-%E6%A7%8B%E5%BB%BA%E5%99%A8%E8%A9%B3%E8%A7%A3)
    
6. [Webhook 機制深度解析](#%E5%85%ADwebhook-%E6%A9%9F%E5%88%B6%E6%B7%B1%E5%BA%A6%E8%A7%A3%E6%9E%90)
    
7. [開發環境完整設置](#%E4%B8%83%E9%96%8B%E7%99%BC%E7%92%B0%E5%A2%83%E5%AE%8C%E6%95%B4%E8%A8%AD%E7%BD%AE)
    
8. [測試策略與實踐](#%E5%85%AB%E6%B8%AC%E8%A9%A6%E7%AD%96%E7%95%A5%E8%88%87%E5%AF%A6%E8%B8%90)
    
9. [實戰專案：Nginx Operator](#%E4%B9%9D%E5%AF%A6%E6%88%B0%E5%B0%88%E6%A1%88nginx-operator)
    
10. [進階主題與最佳實踐](#%E5%8D%81%E9%80%B2%E9%9A%8E%E4%B8%BB%E9%A1%8C%E8%88%87%E6%9C%80%E4%BD%B3%E5%AF%A6%E8%B8%90)
    
11. [常見問題與調試技巧](#%E5%8D%81%E4%B8%80%E5%B8%B8%E8%A6%8B%E5%95%8F%E9%A1%8C%E8%88%87%E8%AA%BF%E8%A9%A6%E6%8A%80%E5%B7%A7)
    

---

# 一、Kubernetes Operator 核心概念

## 1.1 什麼是 Operator？

Operator 是 Kubernetes 的一種**擴展模式**，用於**自動化複雜應用的部署和管理**。它將人類運維知識編碼到軟體中。

### 核心組成部分

```bash
┌─────────────────────────────────────────────────────────┐
│                     Kubernetes Operator                 │
│                                                          │
│  ┌────────────────┐  ┌──────────────┐  ┌─────────────┐ │
│  │  Custom        │  │  Controller  │  │   Domain    │ │
│  │  Resource      │◄─┤  (Reconciler)├─►│  Knowledge  │ │
│  │  Definition    │  │              │  │             │ │
│  └────────────────┘  └──────────────┘  └─────────────┘ │
│         │                    │                 │        │
│         │                    │                 │        │
│         ▼                    ▼                 ▼        │
│    擴展 K8s API        監控&調和狀態      運維邏輯編碼  │
└─────────────────────────────────────────────────────────┘
```

### 1.2 工作原理

```bash
User (kubectl apply)
    │
    ▼
┌─────────────────────────────────────┐
│  Custom Resource (CR)                │
│  例如: OpenTelemetryCollector        │
│                                      │
│  apiVersion: opentelemetry.io/v1beta1│
│  kind: OpenTelemetryCollector        │
│  spec:                               │
│    mode: deployment                  │
│    config: {...}                     │
└─────────────────────────────────────┘
    │ (儲存到 etcd)
    ▼
┌─────────────────────────────────────┐
│  Kubernetes API Server               │
│  (發送 Watch 事件)                   │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  Controller (Reconciler)             │
│                                      │
│  1. 接收事件                         │
│  2. 讀取 CR                          │
│  3. 計算期望狀態                     │
│  4. 調和當前狀態 → 期望狀態          │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  Kubernetes Resources                │
│  - Deployment                        │
│  - Service                           │
│  - ConfigMap                         │
│  - ServiceAccount                    │
│  - ...                               │
└─────────────────────────────────────┘
```

### 1.3 Operator 能力等級

| 等級 | 能力 | OpenTelemetry Operator 支持 |
| --- | --- | --- |
| 1 | 基本安裝 | ✅ |
| 2 | 無縫升級 | ✅ (自動版本升級) |
| 3 | 完整生命週期 | ✅ (Finalizer、備份配置) |
| 4 | 深度洞察 | ✅ (Metrics、Events) |
| 5 | 自動調優 | ✅ (HPA、Target Allocator) |

---

# 二、OpenTelemetry Operator 架構深度解析

## 2.1 專案完整結構

```bash
opentelemetry-operator/
├── main.go                         # 程式入口點 (200+ 行)
│
├── apis/                           # CRD API 定義
│   ├── v1alpha1/                   # Alpha API
│   │   ├── instrumentation_types.go       # 自動埋點 CRD
│   │   ├── opampbridge_types.go          # OpAMP Bridge CRD
│   │   └── targetallocator_types.go      # Target Allocator CRD (deprecated)
│   └── v1beta1/                    # Beta API (穩定版)
│       ├── opentelemetrycollector_types.go  # Collector CRD (主要)
│       ├── targetallocator_types.go         # Target Allocator CRD
│       └── config.go                        # 配置解析
│
├── internal/                       # 內部實現
│   ├── controllers/                # 控制器實現
│   │   ├── opentelemetrycollector_controller.go  # 主控制器 (400+ 行)
│   │   ├── targetallocator_controller.go
│   │   ├── opampbridge_controller.go
│   │   └── reconcile_test.go              # 測試 (56KB!)
│   │
│   ├── manifests/                  # K8s 資源構建器
│   │   ├── collector/              # Collector 資源構建
│   │   │   ├── deployment.go      # Deployment 構建
│   │   │   ├── daemonset.go       # DaemonSet 構建
│   │   │   ├── statefulset.go     # StatefulSet 構建
│   │   │   ├── container.go       # Container 構建 (核心邏輯)
│   │   │   ├── service.go         # Service 構建
│   │   │   ├── configmap.go       # ConfigMap 構建
│   │   │   └── ...
│   │   ├── targetallocator/        # Target Allocator 資源構建
│   │   └── manifestutils/          # 工具函數
│   │
│   ├── webhook/                    # Admission Webhooks
│   │   ├── podmutation/            # Pod Mutation Webhook
│   │   │   └── webhookhandler.go  # Sidecar/Instrumentation 注入
│   │   └── validation/             # Validation Webhook
│   │
│   ├── config/                     # Operator 配置管理
│   ├── rbac/                       # RBAC 工具
│   ├── autodetect/                 # 環境自動檢測
│   │   ├── prometheus/             # Prometheus Operator 檢測
│   │   ├── openshift/              # OpenShift 檢測
│   │   └── certmanager/            # cert-manager 檢測
│   └── status/                     # Status 更新邏輯
│
├── pkg/                            # 公開包
│   ├── collector/                  # Collector 工具
│   │   └── upgrade/                # 版本升級邏輯
│   ├── sidecar/                    # Sidecar 注入邏輯
│   ├── featuregate/                # Feature Gate 管理
│   └── constants/                  # 常量定義
│
├── config/                         # Kustomize 部署配置
│   ├── crd/                        # CRD YAML 文件
│   │   └── bases/                  # 基礎 CRD
│   ├── rbac/                       # RBAC 規則
│   ├── manager/                    # Operator Deployment
│   ├── webhook/                    # Webhook 配置
│   └── samples/                    # CR 範例
│
├── tests/                          # 測試套件
│   ├── e2e/                        # 基礎 E2E 測試
│   ├── e2e-instrumentation/        # 自動埋點測試
│   ├── e2e-targetallocator/        # Target Allocator 測試
│   ├── e2e-upgrade/                # 升級測試
│   └── test-e2e-apps/              # 測試應用
│
├── cmd/                            # 額外的可執行程序
│   ├── otel-allocator/             # Target Allocator 服務
│   ├── operator-opamp-bridge/      # OpAMP Bridge 服務
│   └── gather/                     # 故障排查工具
│
├── Makefile                        # 構建腳本
├── go.mod                          # Go 依賴管理
└── versions.txt                    # 版本管理文件
```

## 2.2 四大核心 CRD 詳解

### 2.2.1 OpenTelemetryCollector (主要 CRD)

**檔案**: `apis/v1beta1/opentelemetrycollector_types.go`

**用途**: 管理 OpenTelemetry Collector 的部署

**支持的部署模式**:

```go
const (
    ModeDeployment  Mode = "deployment"   // 標準部署
    ModeDaemonSet   Mode = "daemonset"    // 每個節點一個實例
    ModeStatefulSet Mode = "statefulset"  // 有狀態部署
    ModeSidecar     Mode = "sidecar"      // 注入到其他 Pod
)
```

**功能特性**:

* ✅ 多種部署模式
    
* ✅ 自動配置管理（ConfigMap 版本控制）
    
* ✅ Target Allocator 集成
    
* ✅ HPA 自動擴縮容
    
* ✅ 版本自動升級
    
* ✅ Ingress/Route 支持
    
* ✅ PodMonitor/ServiceMonitor 支持
    

### 2.2.2 Instrumentation

**檔案**: `apis/v1alpha1/instrumentation_types.go`

**用途**: 配置自動埋點（Auto-instrumentation）

**支持的語言**:

* Java (OpenTelemetry Java Agent)
    
* Node.js (OpenTelemetry Node.js)
    
* Python (OpenTelemetry Python)
    
* .NET (OpenTelemetry .NET)
    
* Go (eBPF-based)
    
* Apache HTTPD
    
* Nginx
    

### 2.2.3 TargetAllocator

**檔案**: `apis/v1beta1/targetallocator_types.go`

**用途**: 分配 Prometheus 抓取目標到多個 Collector 實例

**分配策略**:

* `least-weighted`: 最少加權（基於目標數量）
    
* `consistent-hashing`: 一致性哈希
    
* `per-node`: 每個節點一個 Allocator
    

### 2.2.4 OpAMPBridge

**檔案**: `apis/v1alpha1/opampbridge_types.go`

**用途**: 實現 OpAMP 協議，實現遠程管理 Collector

---

# 三、CRD 完整剖析與實戰

## 3.1 CRD 結構深度解析

### 3.1.1 OpenTelemetryCollector CRD 完整定義

**檔案**: `apis/v1beta1/opentelemetrycollector_types.go:32-145`

```go
// +kubebuilder:object:root=true
// +kubebuilder:resource:shortName=otelcol;otelcols
// +kubebuilder:storageversion
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.scale.replicas,selectorpath=.status.scale.selector
// +kubebuilder:printcolumn:name="Mode",type="string",JSONPath=".spec.mode",description="Deployment Mode"
// +kubebuilder:printcolumn:name="Version",type="string",JSONPath=".status.version",description="OpenTelemetry Version"
// +kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.scale.statusReplicas"
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
// +kubebuilder:printcolumn:name="Image",type="string",JSONPath=".status.image"
// +kubebuilder:printcolumn:name="Management",type="string",JSONPath=".spec.managementState",description="Management State"

type OpenTelemetryCollector struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   OpenTelemetryCollectorSpec   `json:"spec,omitempty"`
    Status OpenTelemetryCollectorStatus `json:"status,omitempty"`
}
```

### 3.1.2 Kubebuilder 註解完全指南

#### 基礎註解

| 註解 | 說明 | 範例 |
| --- | --- | --- |
| `+kubebuilder:object:root=true` | 標記為 CRD 根物件 | 必須添加 |
| `+kubebuilder:subresource:status` | 啟用 status 子資源 | 允許獨立更新 status |
| `+kubebuilder:subresource:scale` | 啟用 scale 子資源 | 支持 `kubectl scale` |
| `+kubebuilder:resource:shortName` | 定義簡稱 | `otelcol,otelcols` |
| `+kubebuilder:storageversion` | 標記為存儲版本 | 多版本時指定 |

#### 驗證註解

```go
// 數值驗證
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=100
Replicas int32 `json:"replicas,omitempty"`

// 字符串驗證
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=255
// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
Name string `json:"name,omitempty"`

// 枚舉驗證
// +kubebuilder:validation:Enum=deployment;daemonset;statefulset;sidecar
Mode string `json:"mode,omitempty"`

// 默認值
// +kubebuilder:default=1
Replicas int32 `json:"replicas,omitempty"`

// CEL 驗證 (Kubernetes 1.25+)
// +kubebuilder:validation:XValidation:rule="self.mode != 'sidecar' || !has(self.replicas)",message="sidecar mode does not support replicas"
Spec OpenTelemetryCollectorSpec `json:"spec,omitempty"`
```

#### 顯示註解

```go
// kubectl get 輸出欄位
// +kubebuilder:printcolumn:name="Mode",type="string",JSONPath=".spec.mode",description="Deployment Mode"
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
```

### 3.1.3 完整 Spec 結構

```go
type OpenTelemetryCollectorSpec struct {
    // ===== 部署配置 =====

    // 部署模式
    // +optional
    // +kubebuilder:default=deployment
    Mode Mode `json:"mode,omitempty"`

    // 副本數（僅 Deployment/StatefulSet 模式）
    // +optional
    Replicas *int32 `json:"replicas,omitempty"`

    // 容器鏡像
    // +optional
    Image string `json:"image,omitempty"`

    // ===== Collector 配置 =====

    // Collector 配置（YAML 格式）
    // +required
    // +kubebuilder:pruning:PreserveUnknownFields
    Config Config `json:"config"`

    // ConfigMap 版本保留數量（用於回滾）
    // +optional
    // +kubebuilder:default:=3
    // +kubebuilder:validation:Minimum:=1
    ConfigVersions int `json:"configVersions,omitempty"`

    // ===== 升級策略 =====

    // 升級策略：automatic 或 none
    // +optional
    UpgradeStrategy UpgradeStrategy `json:"upgradeStrategy"`

    // Deployment 更新策略
    // +optional
    DeploymentUpdateStrategy appsv1.DeploymentStrategy `json:"deploymentUpdateStrategy,omitempty"`

    // DaemonSet 更新策略
    // +optional
    DaemonSetUpdateStrategy appsv1.DaemonSetUpdateStrategy `json:"daemonSetUpdateStrategy,omitempty"`

    // ===== 資源配置 =====

    // 資源限制
    // +optional
    Resources v1.ResourceRequirements `json:"resources,omitempty"`

    // 環境變數
    // +optional
    Env []v1.EnvVar `json:"env,omitempty"`

    // Volume Mounts
    // +optional
    VolumeMounts []v1.VolumeMount `json:"volumeMounts,omitempty"`

    // Volumes
    // +optional
    Volumes []v1.Volume `json:"volumes,omitempty"`

    // ===== 調度配置 =====

    // Node Selector
    // +optional
    NodeSelector map[string]string `json:"nodeSelector,omitempty"`

    // Tolerations
    // +optional
    Tolerations []v1.Toleration `json:"tolerations,omitempty"`

    // Affinity
    // +optional
    Affinity *v1.Affinity `json:"affinity,omitempty"`

    // ===== 網路配置 =====

    // Ingress 配置
    // +optional
    Ingress Ingress `json:"ingress,omitempty"`

    // Service 端口
    // +optional
    Ports []PortsSpec `json:"ports,omitempty"`

    // ===== 高級功能 =====

    // Target Allocator
    // +optional
    TargetAllocator TargetAllocatorEmbedded `json:"targetAllocator,omitempty"`

    // HPA 配置
    // +optional
    Autoscaler *AutoscalerSpec `json:"autoscaler,omitempty"`

    // 探針配置
    // +optional
    LivenessProbe *Probe `json:"livenessProbe,omitempty"`
    ReadinessProbe *Probe `json:"readinessProbe,omitempty"`

    // ===== 可觀測性 =====

    // Observability 配置
    // +optional
    Observability ObservabilitySpec `json:"observability,omitempty"`
}
```

## 3.2 實戰範例：從簡單到複雜

### 3.2.1 最簡範例

**檔案**: `config/samples/core_v1beta1_opentelemetrycollector.yaml`

```yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}
          http: {}
    exporters:
      debug: {}
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [debug]
```

**這個 CR 會創建**:

```bash
# 1. Deployment (1 replica)
kubectl get deployment simplest-collector

# 2. Service (暴露 OTLP 端口)
kubectl get service simplest-collector
# Ports: 4317 (gRPC), 4318 (HTTP)

# 3. Service (headless, 用於 StatefulSet)
kubectl get service simplest-collector-headless

# 4. ConfigMap (Collector 配置)
kubectl get configmap simplest-collector

# 5. ServiceAccount
kubectl get serviceaccount simplest-collector
```

### 3.2.2 生產級範例

```yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: production-collector
  labels:
    env: production
spec:
  # ===== 部署配置 =====
  mode: deployment
  replicas: 3  # 高可用
  image: otel/opentelemetry-collector-k8s:0.88.0

  # ===== 升級策略 =====
  upgradeStrategy: automatic
  deploymentUpdateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

  # ===== ConfigMap 版本控制 =====
  configVersions: 5  # 保留 5 個版本用於回滾

  # ===== Collector 配置 =====
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

      # Prometheus receiver
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 30s
              static_configs:
                - targets: ['0.0.0.0:8888']

    processors:
      # 內存限制器（防止 OOM）
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15

      # 批處理（提升性能）
      batch:
        send_batch_size: 10000
        timeout: 10s

      # 資源屬性
      resource:
        attributes:
          - key: cluster.name
            value: production-cluster
            action: upsert

    exporters:
      # OTLP 導出到後端
      otlp:
        endpoint: backend.example.com:4317
        tls:
          insecure: false
          cert_file: /certs/tls.crt
          key_file: /certs/tls.key

      # Prometheus 導出
      prometheus:
        endpoint: 0.0.0.0:8889

    # Health Check Extension
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133

      pprof:
        endpoint: localhost:1777

    service:
      extensions: [health_check, pprof]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, resource]
          exporters: [otlp]

        metrics:
          receivers: [otlp, prometheus]
          processors: [memory_limiter, batch]
          exporters: [otlp, prometheus]

  # ===== 資源限制 =====
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

  # ===== 環境變數 =====
  env:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

  # ===== 調度配置 =====
  nodeSelector:
    workload: telemetry

  tolerations:
    - key: telemetry
      operator: Equal
      value: "true"
      effect: NoSchedule

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                    - production-collector-collector
            topologyKey: kubernetes.io/hostname

  # ===== Ingress 配置 =====
  ingress:
    type: ingress
    hostname: collector.example.com
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    tls:
      - secretName: collector-tls
        hosts:
          - collector.example.com

  # ===== HPA 自動擴縮容 =====
  autoscaler:
    minReplicas: 3
    maxReplicas: 10
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 300
        policies:
          - type: Percent
            value: 50
            periodSeconds: 60
      scaleUp:
        stabilizationWindowSeconds: 0
        policies:
          - type: Percent
            value: 100
            periodSeconds: 15
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 75
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 80

  # ===== 探針配置 =====
  livenessProbe:
    httpGet:
      path: /
      port: 13133
    initialDelaySeconds: 15
    periodSeconds: 10

  readinessProbe:
    httpGet:
      path: /
      port: 13133
    initialDelaySeconds: 10
    periodSeconds: 5

  # ===== Target Allocator =====
  targetAllocator:
    enabled: true
    replicas: 3
    allocationStrategy: consistent-hashing
    prometheusCR:
      enabled: true
      serviceMonitorSelector: {}
      podMonitorSelector: {}

  # ===== 可觀測性 =====
  observability:
    metrics:
      enableMetrics: true
```

### 3.2.3 DaemonSet 模式範例

```yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: node-collector
spec:
  mode: daemonset  # 每個節點一個實例

  # DaemonSet 更新策略
  daemonSetUpdateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

  # Host Network（訪問節點級資源）
  hostNetwork: true

  config:
    receivers:
      # 主機指標
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          load: {}
          memory: {}
          disk: {}
          filesystem: {}
          network: {}

      # Kubernetes Events
      k8s_events:
        namespaces: [default, kube-system]

    exporters:
      otlp:
        endpoint: central-collector:4317

    service:
      pipelines:
        metrics:
          receivers: [hostmetrics]
          exporters: [otlp]
```

---

# 四、Controller/Reconciler 深度實現

## 4.1 Reconciler 結構體詳解

**檔案**: `internal/controllers/opentelemetrycollector_controller.go:58-67`

```go
type OpenTelemetryCollectorReconciler struct {
    // K8s 客戶端（帶緩存）
    client.Client

    // 事件記錄器（發送 K8s 事件）
    recorder record.EventRecorder

    // API Scheme（資源類型註冊表）
    scheme *runtime.Scheme

    // 結構化日誌器
    log logr.Logger

    // Operator 配置
    config config.Config

    // RBAC 權限審查器
    reviewer *internalRbac.Reviewer

    // 版本升級處理器
    upgrade *upgrade.VersionUpgrade
}
```

### 4.1.1 依賴組件解析

#### Client.Client

```go
// controller-runtime 提供的智能客戶端
// 特性：
// 1. 自動緩存（減少 API Server 壓力）
// 2. 類型安全
// 3. 支持 FieldIndexer（加速查詢）
// 4. 自動處理 RESTMapper

// 使用範例
err := r.Client.Get(ctx, types.NamespacedName{
    Name:      "my-collector",
    Namespace: "default",
}, &collector)
```

#### EventRecorder

```go
// 記錄 Kubernetes 事件
// kubectl describe 時可以看到

r.recorder.Event(
    &instance,                    // 相關物件
    corev1.EventTypeNormal,       // 事件類型
    "Created",                    // 原因
    "Created OpenTelemetry Collector deployment",  // 消息
)
```

#### Reviewer (RBAC 檢查器)

```go
// 檢查 Collector 配置中的 RBAC 需求
// 例如：如果配置中有 k8s_cluster receiver，需要額外權限

if reviewer.NeedsClusterRole(collectorConfig) {
    // 創建 ClusterRole 和 ClusterRoleBinding
}
```

## 4.2 Reconcile 完整流程追蹤

**檔案**: `internal/controllers/opentelemetrycollector_controller.go:234-314`

讓我們逐步追蹤一個完整的 Reconcile 流程：

```go
func (r *OpenTelemetryCollectorReconciler) Reconcile(
    ctx context.Context,
    req ctrl.Request,
) (ctrl.Result, error) {
    log := r.log.WithValues("opentelemetrycollector", req.NamespacedName)

    // ==================== 階段 1: 獲取 CR ====================
    log.Info("Reconciling OpenTelemetryCollector")

    var instance v1beta1.OpenTelemetryCollector
    if err := r.Get(ctx, req.NamespacedName, &instance); err != nil {
        if !apierrors.IsNotFound(err) {
            log.Error(err, "unable to fetch OpenTelemetryCollector")
        }
        // 資源已刪除，忽略錯誤
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // ==================== 階段 2: 構建參數 ====================
    // 包含所有需要的配置、Target Allocator 等
    params, err := r.GetParams(ctx, instance)
    if err != nil {
        log.Error(err, "Failed to create manifest.Params")
        return ctrl.Result{}, err
    }

    // ==================== 階段 3: 處理刪除（Finalizer 模式）====================
    if deletionTimestamp := instance.GetDeletionTimestamp(); deletionTimestamp != nil {
        log.Info("Resource is being deleted")

        if controllerutil.ContainsFinalizer(&instance, collectorFinalizer) {
            log.Info("Running finalizer logic")

            // 執行清理：刪除 ClusterRole、ClusterRoleBinding 等集群級資源
            if err = r.finalizeCollector(ctx, params); err != nil {
                log.Error(err, "Failed to finalize")
                return ctrl.Result{}, err
            }

            // 移除 Finalizer
            if controllerutil.RemoveFinalizer(&instance, collectorFinalizer) {
                err = r.Update(ctx, &instance)
                if err != nil {
                    return ctrl.Result{}, err
                }
            }
            log.Info("Finalizer removed, resource will be deleted")
        }
        return ctrl.Result{}, nil
    }

    // ==================== 階段 4: 檢查管理狀態 ====================
    // ManagementStateUnmanaged: Operator 不管理此資源
    if instance.Spec.ManagementState == v1beta1.ManagementStateUnmanaged {
        log.Info("Skipping reconciliation for unmanaged resource")
        return ctrl.Result{}, nil
    }

    // ==================== 階段 5: 版本升級 ====================
    // 檢查 CR 配置是否需要升級到新格式
    if r.upgrade.NeedsUpgrade(instance) {
        log.Info("Upgrading OpenTelemetryCollector configuration")

        err = r.upgrade.Upgrade(ctx, instance)
        if err != nil {
            log.Error(err, "Failed to upgrade")
            return ctrl.Result{}, err
        }

        // CR 被修改，重新排隊觸發新的 reconcile
        log.Info("Configuration upgraded, requeuing")
        return ctrl.Result{Requeue: true, RequeueAfter: 1 * time.Second}, nil
    }

    // ==================== 階段 6: 添加 Finalizer ====================
    // Finalizer 確保在刪除時執行清理邏輯
    if !controllerutil.ContainsFinalizer(&instance, collectorFinalizer) {
        log.Info("Adding finalizer")

        if controllerutil.AddFinalizer(&instance, collectorFinalizer) {
            err = r.Update(ctx, &instance)
            if err != nil {
                return ctrl.Result{}, err
            }
        }
    }

    // ==================== 階段 7: 構建期望資源 ====================
    log.Info("Building desired objects")

    desiredObjects, buildErr := BuildCollector(params)
    if buildErr != nil {
        log.Error(buildErr, "Failed to build desired objects")
        return ctrl.Result{}, buildErr
    }

    log.Info("Desired objects built", "count", len(desiredObjects))

    // ==================== 階段 8: 查找現有資源 ====================
    log.Info("Finding owned objects")

    ownedObjects, err := r.findOtelOwnedObjects(ctx, params)
    if err != nil {
        log.Error(err, "Failed to find owned objects")
        return ctrl.Result{}, err
    }

    log.Info("Found owned objects", "count", len(ownedObjects))

    // ==================== 階段 9: 調和資源 ====================
    // 比較期望狀態和當前狀態，執行創建/更新/刪除
    log.Info("Reconciling desired objects")

    err = reconcileDesiredObjects(
        ctx,
        r.Client,
        log,
        &instance,
        params.Scheme,
        desiredObjects,
        ownedObjects,
    )

    // ==================== 階段 10: 更新 Status ====================
    return collectorStatus.HandleReconcileStatus(ctx, log, params, instance, err)
}
```

### 4.2.1 詳細追蹤：BuildCollector 函數

**檔案**: `internal/controllers/opentelemetrycollector_controller.go`

```go
func BuildCollector(params manifests.Params) ([]client.Object, error) {
    var objects []client.Object

    // 1. ServiceAccount
    if sa := collector.ServiceAccount(params); sa != nil {
        objects = append(objects, sa)
    }

    // 2. ConfigMap（Collector 配置）
    configMaps, err := collector.ConfigMaps(params)
    if err != nil {
        return nil, err
    }
    for _, cm := range configMaps {
        objects = append(objects, cm)
    }

    // 3. 根據模式選擇工作負載類型
    switch params.OtelCol.Spec.Mode {
    case v1alpha1.ModeDeployment:
        deployment, err := collector.Deployment(params)
        if err != nil {
            return nil, err
        }
        objects = append(objects, deployment)

    case v1alpha1.ModeDaemonSet:
        daemonset, err := collector.DaemonSet(params)
        if err != nil {
            return nil, err
        }
        objects = append(objects, daemonset)

    case v1alpha1.ModeStatefulSet:
        statefulset, err := collector.StatefulSet(params)
        if err != nil {
            return nil, err
        }
        objects = append(objects, statefulset)
    }

    // 4. Service（暴露端口）
    services := collector.Services(params)
    for _, svc := range services {
        objects = append(objects, svc)
    }

    // 5. Ingress（如果配置）
    if params.OtelCol.Spec.Ingress.Type == v1beta1.IngressTypeIngress {
        if ingress := collector.Ingress(params); ingress != nil {
            objects = append(objects, ingress)
        }
    }

    // 6. HPA（如果配置自動擴縮容）
    if params.OtelCol.Spec.Autoscaler != nil {
        if hpa := collector.HorizontalPodAutoscaler(params); hpa != nil {
            objects = append(objects, hpa)
        }
    }

    // 7. RBAC（如果需要）
    if params.Config.CreateRBACPermissions == rbac.Available {
        if clusterRole := collector.ClusterRole(params); clusterRole != nil {
            objects = append(objects, clusterRole)
        }
        if clusterRoleBinding := collector.ClusterRoleBinding(params); clusterRoleBinding != nil {
            objects = append(objects, clusterRoleBinding)
        }
    }

    // 8. ServiceMonitor/PodMonitor（可觀測性）
    if params.OtelCol.Spec.Observability.Metrics.EnableMetrics {
        if sm := collector.ServiceMonitor(params); sm != nil {
            objects = append(objects, sm)
        }
    }

    // 9. Target Allocator（如果啟用）
    if params.OtelCol.Spec.TargetAllocator.Enabled {
        taObjects, err := BuildTargetAllocator(params)
        if err != nil {
            return nil, err
        }
        objects = append(objects, taObjects...)
    }

    return objects, nil
}
```

### 4.2.2 調和邏輯：reconcileDesiredObjects

```go
func reconcileDesiredObjects(
    ctx context.Context,
    client client.Client,
    log logr.Logger,
    owner client.Object,
    scheme *runtime.Scheme,
    desired []client.Object,
    existing map[types.UID]client.Object,
) error {
    // 為期望的資源設置 OwnerReference
    for _, obj := range desired {
        if err := controllerutil.SetControllerReference(owner, obj, scheme); err != nil {
            return err
        }
    }

    // ========== 步驟 1: 創建或更新期望的資源 ==========
    for _, desiredObj := range desired {
        log := log.WithValues(
            "kind", desiredObj.GetObjectKind().GroupVersionKind().Kind,
            "name", desiredObj.GetName(),
        )

        // 嘗試獲取現有資源
        existingObj := desiredObj.DeepCopyObject().(client.Object)
        err := client.Get(ctx, types.NamespacedName{
            Name:      desiredObj.GetName(),
            Namespace: desiredObj.GetNamespace(),
        }, existingObj)

        if err != nil && apierrors.IsNotFound(err) {
            // 資源不存在，創建
            log.Info("Creating resource")
            if err := client.Create(ctx, desiredObj); err != nil {
                log.Error(err, "Failed to create resource")
                return err
            }
            log.Info("Resource created successfully")

        } else if err != nil {
            // 其他錯誤
            return err

        } else {
            // 資源存在，更新
            log.Info("Updating resource")

            // 保留不應該變更的字段
            desiredObj.SetResourceVersion(existingObj.GetResourceVersion())

            if err := client.Update(ctx, desiredObj); err != nil {
                log.Error(err, "Failed to update resource")
                return err
            }
            log.Info("Resource updated successfully")

            // 從待刪除列表中移除
            delete(existing, existingObj.GetUID())
        }
    }

    // ========== 步驟 2: 刪除多餘的資源 ==========
    for uid, obj := range existing {
        log := log.WithValues(
            "kind", obj.GetObjectKind().GroupVersionKind().Kind,
            "name", obj.GetName(),
            "uid", uid,
        )

        log.Info("Deleting orphaned resource")
        if err := client.Delete(ctx, obj); err != nil {
            log.Error(err, "Failed to delete resource")
            return err
        }
        log.Info("Orphaned resource deleted")
    }

    return nil
}
```

## 4.3 索引與緩存優化

### 4.3.1 設置 Field Indexer

**檔案**: `internal/controllers/opentelemetrycollector_controller.go:334-354`

```go
func (r *OpenTelemetryCollectorReconciler) SetupCaches(cluster cluster.Cluster) error {
    ownedResources := r.GetOwnedResourceTypes()

    // 為每種資源類型建立索引
    for _, resource := range ownedResources {
        if err := cluster.GetCache().IndexField(
            context.Background(),
            resource,
            resourceOwnerKey,  // 索引鍵：".metadata.owner"
            func(rawObj client.Object) []string {
                // 提取 Owner Reference
                owner := metav1.GetControllerOf(rawObj)
                if owner == nil {
                    return nil
                }

                // 只索引 OpenTelemetryCollector 擁有的資源
                if owner.Kind != "OpenTelemetryCollector" {
                    return nil
                }

                // 返回 Owner 名稱作為索引值
                return []string{owner.Name}
            },
        ); err != nil {
            return err
        }
    }
    return nil
}
```

### 4.3.2 使用索引加速查詢

```go
func (r *OpenTelemetryCollectorReconciler) findOtelOwnedObjects(
    ctx context.Context,
    params manifests.Params,
) (map[types.UID]client.Object, error) {
    ownedObjects := map[types.UID]client.Object{}

    // 使用索引快速查詢（而不是掃描所有資源）
    listOpts := []client.ListOption{
        client.InNamespace(params.OtelCol.Namespace),
        client.MatchingFields{resourceOwnerKey: params.OtelCol.Name},
    }

    for _, objectType := range r.GetOwnedResourceTypes() {
        objs, err := getList(ctx, r, objectType, listOpts...)
        if err != nil {
            return nil, err
        }

        for uid, object := range objs {
            ownedObjects[uid] = object
        }
    }

    return ownedObjects, nil
}
```

**性能對比**：

| 方法 | 時間複雜度 | 說明 |
| --- | --- | --- |
| 不使用索引 | O(N) | N = 集群中所有 Deployment 數量 |
| 使用索引 | O(M) | M = 被這個 Collector 擁有的 Deployment 數量 |

在大型集群中，性能提升可達 10-100 倍！

---

# 五、Manifest 構建器詳解

## 5.1 Deployment 構建完整解析

**檔案**: `internal/manifests/collector/deployment.go`

```go
func Deployment(params manifests.Params) (*appsv1.Deployment, error) {
    name := naming.Collector(params.OtelCol.Name)

    // ========== 1. 構建標籤 ==========
    // 標籤用於：
    // - Service 選擇器
    // - HPA 目標選擇
    // - 監控發現
    labels := manifestutils.Labels(
        params.OtelCol.ObjectMeta,
        name,
        params.OtelCol.Spec.Image,
        ComponentOpenTelemetryCollector,
        params.Config.LabelsFilter,
    )

    // 標籤範例：
    // app.kubernetes.io/name: my-collector-collector
    // app.kubernetes.io/instance: my-collector
    // app.kubernetes.io/managed-by: opentelemetry-operator
    // app.kubernetes.io/component: opentelemetry-collector
    // app.kubernetes.io/version: 0.88.0

    // ========== 2. 構建註解 ==========
    annotations, err := manifestutils.Annotations(
        params.OtelCol,
        params.Config.AnnotationsFilter,
    )
    if err != nil {
        return nil, err
    }

    // Pod 註解（額外添加 ConfigMap 哈希）
    podAnnotations, err := manifestutils.PodAnnotations(
        params.OtelCol,
        params.Config.AnnotationsFilter,
    )
    if err != nil {
        return nil, err
    }

    // ========== 3. 構建 Deployment ==========
    return &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:        name,
            Namespace:   params.OtelCol.Namespace,
            Labels:      labels,
            Annotations: annotations,
        },
        Spec: appsv1.DeploymentSpec{
            // 副本數
            Replicas: manifestutils.GetInitialReplicas(params.OtelCol),

            // 選擇器（必須與 Pod 標籤匹配）
            Selector: &metav1.LabelSelector{
                MatchLabels: manifestutils.SelectorLabels(
                    params.OtelCol.ObjectMeta,
                    ComponentOpenTelemetryCollector,
                ),
            },

            // 更新策略
            Strategy: params.OtelCol.Spec.DeploymentUpdateStrategy,

            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels:      labels,
                    Annotations: podAnnotations,
                },
                Spec: corev1.PodSpec{
                    // ServiceAccount
                    ServiceAccountName: ServiceAccountName(params.OtelCol),

                    // Init Containers
                    InitContainers: params.OtelCol.Spec.InitContainers,

                    // 主容器 + 額外容器
                    Containers: append(
                        params.OtelCol.Spec.AdditionalContainers,
                        Container(params.Config, params.Log, params.OtelCol, true),
                    ),

                    // Volumes
                    Volumes: Volumes(params.Config, params.OtelCol),

                    // 網路配置
                    DNSPolicy:   manifestutils.GetDNSPolicy(...),
                    DNSConfig:   &params.OtelCol.Spec.PodDNSConfig,
                    HostNetwork: params.OtelCol.Spec.HostNetwork,

                    // 進程命名空間共享
                    ShareProcessNamespace: &params.OtelCol.Spec.ShareProcessNamespace,

                    // 調度配置
                    Tolerations:               params.OtelCol.Spec.Tolerations,
                    NodeSelector:              params.OtelCol.Spec.NodeSelector,
                    Affinity:                  params.OtelCol.Spec.Affinity,
                    TopologySpreadConstraints: params.OtelCol.Spec.TopologySpreadConstraints,

                    // 安全配置
                    SecurityContext:   params.OtelCol.Spec.PodSecurityContext,
                    PriorityClassName: params.OtelCol.Spec.PriorityClassName,

                    // 優雅終止
                    TerminationGracePeriodSeconds: params.OtelCol.Spec.TerminationGracePeriodSeconds,
                },
            },
        },
    }, nil
}
```

## 5.2 Container 構建深度解析

**檔案**: `internal/manifests/collector/container.go:28-150`

```go
func Container(
    cfg config.Config,
    logger logr.Logger,
    otelcol v1beta1.OpenTelemetryCollector,
    addConfig bool,
) corev1.Container {
    // ========== 1. 決定鏡像 ==========
    image := otelcol.Spec.Image
    if len(image) == 0 {
        image = cfg.CollectorImage  // 使用預設鏡像
    }

    // ========== 2. 構建端口列表 ==========
    ports := getContainerPorts(logger, otelcol)
    // 從 Collector 配置中解析端口
    // 例如：otlp.protocols.grpc.endpoint: 0.0.0.0:4317
    //      → ContainerPort{Name: "otlp-grpc", ContainerPort: 4317}

    // ========== 3. 構建啟動參數 ==========
    var args []string
    var volumeMounts []corev1.VolumeMount

    // 配置文件始終是第一個參數
    if addConfig {
        args = append(args, fmt.Sprintf(
            "--config=/conf/%s",
            cfg.CollectorConfigMapEntry,
        ))

        volumeMounts = append(volumeMounts, corev1.VolumeMount{
            Name:      naming.ConfigMapVolume(),
            MountPath: "/conf",
        })
    }

    // 如果啟用 Target Allocator mTLS
    if otelcol.Spec.TargetAllocator.Enabled &&
       cfg.CertManagerAvailability == certmanager.Available &&
       featuregate.EnableTargetAllocatorMTLS.IsEnabled() {
        volumeMounts = append(volumeMounts, corev1.VolumeMount{
            Name:      naming.TAClientCertificate(otelcol.Name),
            MountPath: constants.TACollectorTLSDirPath,
        })
    }

    // 額外的用戶自定義參數（排序保證一致性）
    argsMap := otelcol.Spec.Args
    if argsMap == nil {
        argsMap = map[string]string{}
    }

    var sortedArgs []string
    for k, v := range argsMap {
        sortedArgs = append(sortedArgs, fmt.Sprintf("--%s=%s", k, v))
    }
    sort.Strings(sortedArgs)
    args = append(args, sortedArgs...)

    // ========== 4. Volume Mounts ==========
    if len(otelcol.Spec.VolumeMounts) > 0 {
        volumeMounts = append(volumeMounts, otelcol.Spec.VolumeMounts...)
    }

    // 額外 ConfigMap 掛載
    for _, cm := range otelcol.Spec.ConfigMaps {
        volumeMounts = append(volumeMounts, corev1.VolumeMount{
            Name:      naming.ConfigMapExtra(cm.Name),
            MountPath: path.Join("/var/conf", cm.MountPath, naming.ConfigMapExtra(cm.Name)),
        })
    }

    // ========== 5. 健康檢查探針 ==========
    // Liveness Probe
    livenessProbe, err := otelcol.Spec.Config.GetLivenessProbe(logger)
    if err != nil {
        logger.Error(err, "cannot create liveness probe")
    } else {
        defaultProbeSettings(livenessProbe, otelcol.Spec.LivenessProbe)
    }

    // Readiness Probe
    readinessProbe, err := otelcol.Spec.Config.GetReadinessProbe(logger)
    if err != nil {
        logger.Error(err, "cannot create readiness probe")
    } else {
        defaultProbeSettings(readinessProbe, otelcol.Spec.ReadinessProbe)
    }

    // Startup Probe
    startupProbe, err := otelcol.Spec.Config.GetStartupProbe(logger)
    if err != nil {
        logger.Error(err, "cannot create startup probe")
    }

    // ========== 6. 環境變數 ==========
    envVars := otelcol.Spec.Env

    // 添加預設環境變數
    envVars = append(envVars, corev1.EnvVar{
        Name: "POD_NAME",
        ValueFrom: &corev1.EnvVarSource{
            FieldRef: &corev1.ObjectFieldSelector{
                FieldPath: "metadata.name",
            },
        },
    })

    // ========== 7. 返回完整 Container ==========
    return corev1.Container{
        Name:            naming.Container(),
        Image:           image,
        ImagePullPolicy: otelcol.Spec.ImagePullPolicy,
        Args:            args,
        Ports:           ports,
        VolumeMounts:    volumeMounts,
        Env:             envVars,
        EnvFrom:         otelcol.Spec.EnvFrom,
        Resources:       otelcol.Spec.Resources,
        LivenessProbe:   livenessProbe,
        ReadinessProbe:  readinessProbe,
        StartupProbe:    startupProbe,
        SecurityContext: otelcol.Spec.SecurityContext,
    }
}
```

### 5.2.1 端口解析邏輯

```go
func getContainerPorts(logger logr.Logger, otelcol v1beta1.OpenTelemetryCollector) []corev1.ContainerPort {
    var ports []corev1.ContainerPort

    // 從 Collector 配置中解析端口
    cfg := otelcol.Spec.Config

    // 解析 receivers 中的端口
    if receivers, ok := cfg.Receivers.Object["otlp"].(map[string]interface{}); ok {
        if protocols, ok := receivers["protocols"].(map[string]interface{}); ok {
            // gRPC 端口
            if grpc, ok := protocols["grpc"].(map[string]interface{}); ok {
                if endpoint, ok := grpc["endpoint"].(string); ok {
                    port := extractPort(endpoint)  // "0.0.0.0:4317" → 4317
                    ports = append(ports, corev1.ContainerPort{
                        Name:          "otlp-grpc",
                        ContainerPort: port,
                        Protocol:      corev1.ProtocolTCP,
                    })
                }
            }

            // HTTP 端口
            if http, ok := protocols["http"].(map[string]interface{}); ok {
                if endpoint, ok := http["endpoint"].(string); ok {
                    port := extractPort(endpoint)  // "0.0.0.0:4318" → 4318
                    ports = append(ports, corev1.ContainerPort{
                        Name:          "otlp-http",
                        ContainerPort: port,
                        Protocol:      corev1.ProtocolTCP,
                    })
                }
            }
        }
    }

    // 用戶自定義端口
    for _, portSpec := range otelcol.Spec.Ports {
        ports = append(ports, corev1.ContainerPort{
            Name:          portSpec.Name,
            ContainerPort: portSpec.Port,
            Protocol:      portSpec.Protocol,
        })
    }

    return ports
}
```

## 5.3 ConfigMap 構建與版本控制

**檔案**: `internal/manifests/collector/configmap.go`

```go
func ConfigMaps(params manifests.Params) ([]*corev1.ConfigMap, error) {
    var configMaps []*corev1.ConfigMap

    // ========== 1. 構建主 ConfigMap ==========
    name := naming.ConfigMap(params.OtelCol.Name)

    // 序列化 Collector 配置為 YAML
    configYAML, err := params.OtelCol.Spec.Config.Yaml()
    if err != nil {
        return nil, err
    }

    // 計算配置哈希（用於觸發 Pod 重啟）
    configHash := hash(configYAML)

    configMap := &corev1.ConfigMap{
        ObjectMeta: metav1.ObjectMeta{
            Name:      fmt.Sprintf("%s-%s", name, configHash[:8]),
            Namespace: params.OtelCol.Namespace,
            Labels: manifestutils.Labels(
                params.OtelCol.ObjectMeta,
                name,
                params.OtelCol.Spec.Image,
                ComponentOpenTelemetryCollector,
                params.Config.LabelsFilter,
            ),
            Annotations: map[string]string{
                "opentelemetry.io/config-hash": configHash,
                "opentelemetry.io/created-at":  time.Now().Format(time.RFC3339),
            },
        },
        Data: map[string]string{
            cfg.CollectorConfigMapEntry: configYAML,
        },
    }

    configMaps = append(configMaps, configMap)

    // ========== 2. Target Allocator ConfigMap ==========
    if params.OtelCol.Spec.TargetAllocator.Enabled {
        taConfigMap, err := BuildTargetAllocatorConfigMap(params)
        if err != nil {
            return nil, err
        }
        configMaps = append(configMaps, taConfigMap)
    }

    return configMaps, nil
}
```

### 5.3.1 ConfigMap 版本控制實現

**檔案**: `internal/controllers/opentelemetrycollector_controller.go:146-158`

```go
func getCollectorConfigMapsToKeep(
    configVersionsToKeep int,
    configMaps []*corev1.ConfigMap,
) []*corev1.ConfigMap {
    configVersionsToKeep = max(1, configVersionsToKeep)

    // 按創建時間排序（最新到最舊）
    sort.Slice(configMaps, func(i, j int) bool {
        iTime := configMaps[i].GetCreationTimestamp().Time
        jTime := configMaps[j].GetCreationTimestamp().Time
        return iTime.After(jTime)
    })

    // 保留最新的 N 個
    configMapsToKeep := min(configVersionsToKeep, len(configMaps))
    return configMaps[:configMapsToKeep]
}
```

**工作流程**:

```bash
配置更新 → 創建新 ConfigMap → 舊 ConfigMap 保留（用於回滾）

範例：
1. my-collector-abcd1234  (當前使用)
2. my-collector-efgh5678  (保留)
3. my-collector-ijkl9012  (保留)
4. my-collector-mnop3456  (將被刪除)
```

---

# 六、Webhook 機制深度解析

## 6.1 Webhook 概述

Kubernetes Admission Webhooks 允許在資源創建/更新前進行攔截和修改。

```bash
User: kubectl apply -f pod.yaml
    │
    ▼
API Server
    │
    ├──► Mutating Webhook  (修改資源)
    │    - Sidecar 注入
    │    - 添加標籤/註解
    │    - 修改容器配置
    │
    ├──► Validating Webhook (驗證資源)
    │    - 檢查配置合法性
    │    - 執行業務規則
    │
    └──► 存儲到 etcd
```

## 6.2 Pod Mutation Webhook 實現

**檔案**: `internal/webhook/podmutation/webhookhandler.go`

### 6.2.1 Webhook Handler 結構

```go
type podMutationWebhook struct {
    client      client.Client     // K8s 客戶端
    decoder     admission.Decoder // 請求解碼器
    logger      logr.Logger       // 日誌
    podMutators []PodMutator      // Pod 變更器鏈
    config      config.Config     // 配置
}

// PodMutator 介面
type PodMutator interface {
    Mutate(ctx context.Context, ns corev1.Namespace, pod corev1.Pod) (corev1.Pod, error)
}
```

### 6.2.2 Webhook 註冊

```go
// +kubebuilder:webhook:path=/mutate-v1-pod,mutating=true,failurePolicy=ignore,groups="",resources=pods,verbs=create,versions=v1,name=mpod.kb.io,sideEffects=none,admissionReviewVersions=v1
```

**參數解析**:

* `path`: Webhook 路徑
    
* `mutating=true`: 變更型 Webhook
    
* `failurePolicy=ignore`: 失敗時允許請求繼續（高可用）
    
* `resources=pods`: 只攔截 Pod
    
* `verbs=create`: 只攔截創建操作
    

### 6.2.3 Handle 方法完整實現

```go
func (p *podMutationWebhook) Handle(
    ctx context.Context,
    req admission.Request,
) admission.Response {
    log := p.logger.WithValues("namespace", req.Namespace, "name", req.Name)
    log.Info("Webhook called")

    // ========== 1. 解碼 Pod ==========
    pod := corev1.Pod{}
    err := p.decoder.Decode(req, &pod)
    if err != nil {
        log.Error(err, "Failed to decode pod")
        return admission.Errored(http.StatusBadRequest, err)
    }

    log.Info("Pod decoded", "podName", pod.Name)

    // ========== 2. 獲取 Namespace ==========
    ns := corev1.Namespace{}
    err = p.client.Get(ctx, types.NamespacedName{
        Name:      req.Namespace,
        Namespace: "",
    }, &ns)
    if err != nil {
        log.Error(err, "Failed to get namespace")
        res := admission.Errored(http.StatusInternalServerError, err)
        // 設置 Allowed = true，即使錯誤也不阻止 Pod 創建
        res.Allowed = true
        return res
    }

    // ========== 3. 執行所有 Mutator ==========
    originalPod := pod.DeepCopy()

    for i, mutator := range p.podMutators {
        log := log.WithValues("mutatorIndex", i)
        log.Info("Applying mutator")

        pod, err = mutator.Mutate(ctx, ns, pod)
        if err != nil {
            log.Error(err, "Mutator failed")
            res := admission.Errored(http.StatusInternalServerError, err)
            res.Allowed = true  // 錯誤不阻止創建
            return res
        }
    }

    // ========== 4. 檢查是否有修改 ==========
    if reflect.DeepEqual(originalPod, &pod) {
        log.Info("No changes made, allowing")
        return admission.Allowed("no changes")
    }

    // ========== 5. 生成 JSONPatch ==========
    marshaledPod, err := json.Marshal(pod)
    if err != nil {
        log.Error(err, "Failed to marshal pod")
        res := admission.Errored(http.StatusInternalServerError, err)
        res.Allowed = true
        return res
    }

    log.Info("Returning patch response")
    return admission.PatchResponseFromRaw(req.Object.Raw, marshaledPod)
}
```

## 6.3 Sidecar 注入實現

### 6.3.1 Sidecar Mutator

**檔案**: `internal/webhook/podmutation/sidecar_mutator.go`

```go
type sidecarMutator struct {
    client client.Client
    logger logr.Logger
    config config.Config
}

func (s *sidecarMutator) Mutate(
    ctx context.Context,
    ns corev1.Namespace,
    pod corev1.Pod,
) (corev1.Pod, error) {
    log := s.logger.WithValues("pod", pod.Name, "namespace", ns.Name)

    // ========== 1. 檢查是否需要注入 ==========
    // 註解：sidecar.opentelemetry.io/inject
    injectAnnotation, exists := pod.Annotations["sidecar.opentelemetry.io/inject"]
    if !exists || injectAnnotation == "false" {
        log.V(1).Info("Sidecar injection not requested")
        return pod, nil
    }

    log.Info("Sidecar injection requested", "value", injectAnnotation)

    // ========== 2. 查找 OpenTelemetryCollector ==========
    var collectorName string
    if injectAnnotation == "true" {
        // 自動查找（尋找同 namespace 下 mode=sidecar 的 Collector）
        collectorName, err := s.findSidecarCollector(ctx, ns.Name)
        if err != nil {
            return pod, err
        }
        log.Info("Auto-detected collector", "name", collectorName)
    } else {
        // 使用指定的 Collector
        collectorName = injectAnnotation
        log.Info("Using specified collector", "name", collectorName)
    }

    // 獲取 Collector CR
    collector := &v1beta1.OpenTelemetryCollector{}
    err := s.client.Get(ctx, types.NamespacedName{
        Name:      collectorName,
        Namespace: ns.Name,
    }, collector)
    if err != nil {
        log.Error(err, "Failed to get collector")
        return pod, err
    }

    // 驗證模式
    if collector.Spec.Mode != v1alpha1.ModeSidecar {
        return pod, fmt.Errorf("collector %s is not in sidecar mode", collectorName)
    }

    // ========== 3. 注入 Sidecar 容器 ==========
    sidecarContainer := s.buildSidecarContainer(collector)
    pod.Spec.Containers = append(pod.Spec.Containers, sidecarContainer)

    // ========== 4. 添加 Volumes ==========
    volumes := s.buildSidecarVolumes(collector)
    pod.Spec.Volumes = append(pod.Spec.Volumes, volumes...)

    // ========== 5. 修改應用容器環境變數 ==========
    // 將 OTLP endpoint 設置為 localhost
    for i := range pod.Spec.Containers {
        if pod.Spec.Containers[i].Name == sidecarContainer.Name {
            continue  // 跳過 sidecar 容器本身
        }

        pod.Spec.Containers[i].Env = append(
            pod.Spec.Containers[i].Env,
            corev1.EnvVar{
                Name:  "OTEL_EXPORTER_OTLP_ENDPOINT",
                Value: "http://localhost:4317",
            },
        )
    }

    log.Info("Sidecar injected successfully")
    return pod, nil
}

func (s *sidecarMutator) buildSidecarContainer(
    collector *v1beta1.OpenTelemetryCollector,
) corev1.Container {
    return corev1.Container{
        Name:  "otc-sidecar",
        Image: collector.Spec.Image,
        Args: []string{
            "--config=/conf/collector.yaml",
        },
        Ports: []corev1.ContainerPort{
            {Name: "otlp-grpc", ContainerPort: 4317},
            {Name: "otlp-http", ContainerPort: 4318},
        },
        VolumeMounts: []corev1.VolumeMount{
            {
                Name:      "otc-config",
                MountPath: "/conf",
            },
        },
        Resources: collector.Spec.Resources,
    }
}
```

### 6.3.2 Sidecar 注入效果

**原始 Pod**:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  annotations:
    sidecar.opentelemetry.io/inject: "true"
spec:
  containers:
    - name: app
      image: myapp:v1.0
```

**注入後的 Pod**:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  annotations:
    sidecar.opentelemetry.io/inject: "true"
    sidecar.opentelemetry.io/injected: "true"
spec:
  containers:
    # 原始應用容器（已修改）
    - name: app
      image: myapp:v1.0
      env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: http://localhost:4317

    # 注入的 Sidecar 容器
    - name: otc-sidecar
      image: otel/opentelemetry-collector:0.88.0
      args:
        - --config=/conf/collector.yaml
      ports:
        - name: otlp-grpc
          containerPort: 4317
        - name: otlp-http
          containerPort: 4318
      volumeMounts:
        - name: otc-config
          mountPath: /conf

  volumes:
    - name: otc-config
      configMap:
        name: my-sidecar-collector
```

---

# 七、開發環境完整設置

## 7.1 前置工具安裝

### 7.1.1 Go 語言環境

```bash
# ========== macOS ==========
brew install go@1.24

# 設置環境變數
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

# 驗證
go version  # 應該顯示 go1.24 或更高

# ========== Linux ==========
# 下載並安裝
wget https://go.dev/dl/go1.24.0.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.24.0.linux-amd64.tar.gz

# 添加到 PATH（添加到 ~/.bashrc 或 ~/.zshrc）
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

# 重新加載配置
source ~/.bashrc  # 或 source ~/.zshrc
```

### 7.1.2 Docker

```bash
# ========== macOS ==========
brew install --cask docker
# 或下載 Docker Desktop: https://www.docker.com/products/docker-desktop

# ========== Linux (Ubuntu/Debian) ==========
# 安裝依賴
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release

# 添加 Docker GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# 設置倉庫
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# 安裝 Docker Engine
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

# 允許非 root 用戶使用 Docker
sudo usermod -aG docker $USER
newgrp docker

# 驗證
docker --version
docker run hello-world
```

### 7.1.3 kubectl

```bash
# ========== macOS ==========
brew install kubectl

# ========== Linux ==========
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# 驗證
kubectl version --client
```

### 7.1.4 Kind (Kubernetes in Docker)

```bash
# ========== macOS ==========
brew install kind

# ========== Linux ==========
go install sigs.k8s.io/kind@v0.20.0

# 或使用二進制
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

# 驗證
kind version
```

### 7.1.5 Kustomize

```bash
# ========== macOS ==========
brew install kustomize

# ========== Linux ==========
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
sudo mv kustomize /usr/local/bin/

# 驗證
kustomize version
```

### 7.1.6 controller-gen (Kubebuilder 工具)

```bash
# 安裝 controller-gen（生成 CRD 和 deepcopy 代碼）
go install sigs.k8s.io/controller-tools/cmd/controller-gen@latest

# 驗證
controller-gen --version
```

### 7.1.7 Operator SDK (可選)

```bash
# ========== macOS ==========
brew install operator-sdk

# ========== Linux ==========
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.29.0

curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
chmod +x operator-sdk_${OS}_${ARCH}
sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk

# 驗證
operator-sdk version
```

## 7.2 專案設置

### 7.2.1 Clone 專案

```bash
# Clone OpenTelemetry Operator
git clone https://github.com/open-telemetry/opentelemetry-operator.git
cd opentelemetry-operator

# 查看分支
git branch -a

# 切換到最新的穩定分支（如果需要）
git checkout main
```

### 7.2.2 了解 Makefile

OpenTelemetry Operator 的 Makefile 提供了豐富的命令：

**檔案**: `Makefile`

```bash
# ========== 查看所有可用命令 ==========
make help

# 主要命令：
make manifests         # 生成 CRD YAML
make generate          # 生成 Go 代碼（deepcopy 等）
make fmt               # 格式化代碼
make vet               # 靜態分析
make test              # 運行單元測試
make docker-build      # 構建 Docker 鏡像
make install           # 安裝 CRD 到集群
make deploy            # 部署 Operator 到集群
make run               # 本地運行 Operator
make kind-cluster      # 創建 Kind 測試集群
make e2e               # 運行 E2E 測試
```

### 7.2.3 創建 Kind 集群

**檔案**: `Makefile:101`

```bash
# ========== 使用 Makefile 創建（推薦）==========
make kind-cluster

# 這個命令會：
# 1. 創建一個名為 "otel-operator" 的 Kind 集群
# 2. 使用配置文件 kind-1.33.yaml（或其他版本）
# 3. 自動安裝 cert-manager

# ========== 手動創建 ==========
# 創建集群配置文件
cat <<EOF > kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    image: kindest/node:v1.33.0
  - role: worker
    image: kindest/node:v1.33.0
  - role: worker
    image: kindest/node:v1.33.0
EOF

# 創建集群
kind create cluster \
  --name otel-operator \
  --config kind-config.yaml

# 驗證集群
kubectl cluster-info --context kind-otel-operator
kubectl get nodes
```

### 7.2.4 安裝 cert-manager

**檔案**: `Makefile:106` (CERTMANAGER\_VERSION)

```bash
# ========== 方式 1: 使用 kubectl ==========
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# 等待 cert-manager 就緒
kubectl wait --for=condition=Available --timeout=300s \
  deployment/cert-manager -n cert-manager

kubectl wait --for=condition=Available --timeout=300s \
  deployment/cert-manager-webhook -n cert-manager

kubectl wait --for=condition=Available --timeout=300s \
  deployment/cert-manager-cainjector -n cert-manager

# 驗證
kubectl get pods -n cert-manager

# ========== 方式 2: 使用 Helm ==========
helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.13.0 \
  --set installCRDs=true

# 驗證
kubectl get pods -n cert-manager
```

## 7.3 本地開發工作流

### 7.3.1 生成 CRD 和代碼

```bash
# ========== 1. 生成 CRD YAML ==========
make manifests

# 這會：
# - 從 apis/**/types.go 的註解生成 CRD YAML
# - 輸出到 config/crd/bases/
# - 使用 controller-gen 工具

# 查看生成的 CRD
ls -lh config/crd/bases/
# opentelemetry.io_instrumentations.yaml
# opentelemetry.io_opentelemetrycollectors.yaml
# opentelemetry.io_targetallocators.yaml
# opentelemetry.io_opampbridges.yaml

# ========== 2. 生成 Go 代碼 ==========
make generate

# 這會：
# - 生成 DeepCopy 方法（zz_generated.deepcopy.go）
# - 使用 controller-gen 工具

# 查看生成的代碼
find apis -name "zz_generated.*.go"
```

**背後的命令**（來自 Makefile）：

```bash
# manifests 目標
controller-gen \
  crd:generateEmbeddedObjectMeta=true,maxDescLen=0 \
  rbac:roleName=manager-role \
  webhook \
  paths="./..." \
  output:crd:artifacts:config=config/crd/bases

# generate 目標
controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
```

### 7.3.2 安裝 CRD 到集群

```bash
# ========== 安裝 CRD ==========
make install

# 等價於：
# kustomize build config/crd | kubectl apply -f -

# 驗證 CRD 已安裝
kubectl get crd | grep opentelemetry

# 應該看到：
# instrumentations.opentelemetry.io
# opampbridges.opentelemetry.io
# opentelemetrycollectors.opentelemetry.io
# targetallocators.opentelemetry.io

# 查看 CRD 詳情
kubectl explain opentelemetrycollector
kubectl explain opentelemetrycollector.spec
kubectl explain opentelemetrycollector.spec.config
```

### 7.3.3 本地運行 Operator

**方式 1: 使用 Makefile（推薦）**

**檔案**: `Makefile:202-204`

```bash
# 本地運行（不部署到集群）
make run

# 這會：
# 1. 執行 make generate fmt vet manifests
# 2. 運行 go run ./main.go --zap-devel
# 3. Operator 運行在本地，連接到 Kind 集群

# 默認禁用 Webhooks（本地運行時）
# 如果要啟用 Webhooks：
make run ENABLE_WEBHOOKS=true
```

**方式 2: 直接使用 go run**

```bash
# 設置環境變數
export WATCH_NAMESPACE=default  # 只監控 default namespace
# 或不設置，監控所有 namespace

# 禁用 Webhooks（本地運行通常禁用）
export ENABLE_WEBHOOKS=false

# 運行
go run ./main.go \
  --zap-devel \
  --metrics-addr=:8080 \
  --enable-leader-election=false \
  --health-probe-addr=:8081

# 參數說明：
# --zap-devel: 開發模式日誌（更詳細）
# --metrics-addr: Prometheus metrics 地址
# --enable-leader-election: 禁用 leader election（本地單實例）
# --health-probe-addr: 健康檢查地址
```

**查看日誌輸出**：

```bash
2025-01-15T10:00:00.000Z	INFO	setup	Starting the OpenTelemetry Operator
2025-01-15T10:00:00.001Z	INFO	setup	opentelemetry-operator version	{"version": "0.92.0"}
2025-01-15T10:00:00.002Z	INFO	setup	build-date	{"date": "2025-01-15"}
2025-01-15T10:00:00.003Z	INFO	setup	go-version	{"version": "go1.24.0"}
2025-01-15T10:00:00.010Z	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
2025-01-15T10:00:00.011Z	INFO	setup	starting manager
2025-01-15T10:00:00.011Z	INFO	controller	Starting EventSource	{"controller": "opentelemetrycollector", "source": "kind source: *v1beta1.OpenTelemetryCollector"}
2025-01-15T10:00:00.112Z	INFO	controller	Starting Controller	{"controller": "opentelemetrycollector"}
2025-01-15T10:00:00.112Z	INFO	controller	Starting workers	{"controller": "opentelemetrycollector", "worker count": 1}
```

### 7.3.4 測試 Operator

**在本地 Operator 運行時，打開另一個終端**：

```bash
# ========== 創建測試 CR ==========
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: test-local
spec:
  mode: deployment
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    processors:
      batch: {}
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug]
EOF

# ========== 觀察 Operator 日誌 ==========
# 在運行 make run 的終端，你會看到：
# INFO	Reconciling OpenTelemetryCollector	{"opentelemetrycollector": "default/test-local"}
# INFO	Building desired objects
# INFO	Creating resource	{"kind": "ServiceAccount", "name": "test-local-collector"}
# INFO	Creating resource	{"kind": "ConfigMap", "name": "test-local-collector-abcd1234"}
# INFO	Creating resource	{"kind": "Deployment", "name": "test-local-collector"}
# INFO	Creating resource	{"kind": "Service", "name": "test-local-collector"}

# ========== 驗證資源 ==========
kubectl get otelcol test-local
kubectl get deployment test-local-collector
kubectl get service test-local-collector
kubectl get configmap -l app.kubernetes.io/instance=test-local

# ========== 查看 Collector 日誌 ==========
kubectl logs -f deployment/test-local-collector

# ========== 測試配置更新 ==========
kubectl patch otelcol test-local --type=merge -p '
{
  "spec": {
    "config": {
      "exporters": {
        "debug": {
          "verbosity": "normal"
        }
      }
    }
  }
}'

# 觀察 Operator 日誌，會看到：
# - 新 ConfigMap 被創建
# - Deployment 被更新（觸發滾動更新）

# ========== 清理 ==========
kubectl delete otelcol test-local
```

### 7.3.5 調試技巧

#### 使用 Delve 調試器

```bash
# 安裝 Delve
go install github.com/go-delve/delve/cmd/dlv@latest

# 使用 Delve 運行
dlv debug ./main.go -- \
  --zap-devel \
  --enable-leader-election=false

# 設置斷點
(dlv) break internal/controllers/opentelemetrycollector_controller.go:234
(dlv) continue

# 創建 CR 後，斷點會被觸發
```

#### 使用 VS Code 調試

創建 `.vscode/launch.json`：

```json
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Debug Operator",
      "type": "go",
      "request": "launch",
      "mode": "debug",
      "program": "${workspaceFolder}/main.go",
      "env": {
        "ENABLE_WEBHOOKS": "false"
      },
      "args": [
        "--zap-devel",
        "--enable-leader-election=false"
      ]
    }
  ]
}
```

按 F5 開始調試，可以設置斷點、查看變數等。

### 7.3.6 部署到集群（完整流程）

```bash
# ========== 1. 構建鏡像 ==========
# 設置鏡像名稱
export IMG=localhost:5000/opentelemetry-operator:dev

make docker-build IMG=$IMG

# ========== 2. 推送到本地 registry（Kind 使用）==========
# 創建本地 registry（如果沒有）
docker run -d -p 5000:5000 --name kind-registry registry:2

# 推送鏡像
docker push $IMG

# 或直接加載到 Kind
kind load docker-image $IMG --name otel-operator

# ========== 3. 部署 Operator ==========
make deploy IMG=$IMG

# 這會：
# 1. 安裝 CRD
# 2. 創建 Namespace (opentelemetry-operator-system)
# 3. 部署 Operator
# 4. 創建 RBAC 資源
# 5. 設置 Webhooks

# ========== 4. 驗證部署 ==========
kubectl get pods -n opentelemetry-operator-system

# 應該看到：
# NAME                                                        READY   STATUS    RESTARTS   AGE
# opentelemetry-operator-controller-manager-xxxxxxxxx-xxxxx   2/2     Running   0          1m

# 查看日誌
kubectl logs -f -n opentelemetry-operator-system \
  deployment/opentelemetry-operator-controller-manager \
  -c manager

# ========== 5. 測試 ==========
kubectl apply -f config/samples/core_v1beta1_opentelemetrycollector.yaml

kubectl get otelcol
kubectl get all -l app.kubernetes.io/managed-by=opentelemetry-operator

# ========== 6. 清理 ==========
make undeploy
```

## 7.4 常用開發命令速查

```bash
# ========== 代碼生成 ==========
make manifests              # 生成 CRD
make generate               # 生成 deepcopy 代碼
make bundle                 # 生成 Operator bundle（OLM）

# ========== 代碼質量 ==========
make fmt                    # 格式化代碼
make vet                    # 靜態分析
make lint                   # Linting（需要 golangci-lint）
make test                   # 運行單元測試
make test-coverage          # 測試覆蓋率

# ========== 構建 ==========
make manager                # 構建 Operator 二進制
make targetallocator        # 構建 Target Allocator
make operator-opamp-bridge  # 構建 OpAMP Bridge
make docker-build           # 構建 Docker 鏡像

# ========== 部署 ==========
make install                # 安裝 CRD
make uninstall              # 卸載 CRD
make deploy                 # 部署 Operator
make undeploy               # 卸載 Operator
make run                    # 本地運行

# ========== 測試 ==========
make kind-cluster           # 創建 Kind 集群
make kind-delete-cluster    # 刪除 Kind 集群
make e2e                    # E2E 測試
make e2e-targetallocator    # Target Allocator E2E
make e2e-instrumentation    # Instrumentation E2E

# ========== 清理 ==========
make clean                  # 清理生成的文件
```

## 7.5 開發環境故障排查

### 7.5.1 常見問題

**問題 1: controller-gen 未找到**

```bash
# 錯誤信息
controller-gen: command not found

# 解決方案
go install sigs.k8s.io/controller-tools/cmd/controller-gen@latest
```

**問題 2: Kind 集群無法訪問**

```bash
# 檢查集群
kind get clusters

# 檢查 kubeconfig
kubectl config get-contexts

# 切換 context
kubectl config use-context kind-otel-operator
```

**問題 3: cert-manager 未就緒**

```bash
# 檢查 cert-manager pods
kubectl get pods -n cert-manager

# 查看日誌
kubectl logs -n cert-manager deployment/cert-manager

# 重新安裝
kubectl delete namespace cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
```

**問題 4: Webhook 錯誤**

```bash
# 錯誤信息
Error from server (InternalError): error when creating "test.yaml": Internal error occurred: failed calling webhook

# 解決方案（本地運行時）
export ENABLE_WEBHOOKS=false
make run
```

---

# 八、測試策略與實踐

## 8.1 測試金字塔

```bash
           /\
          /  \
         / E2E \ (少量，關鍵場景)
        /------\
       /  集成  \ (中等，API 交互)
      /----------\
     /  單元測試   \ (大量，快速反饋)
    /--------------\
```

OpenTelemetry Operator 的測試覆蓋：

* **單元測試**: `internal/**/*_test.go`
    
* **集成測試**: `tests/e2e*/**`
    
* **性能測試**: 負載測試腳本
    

## 8.2 單元測試深度實踐

### 8.2.1 測試結構

**檔案**: `internal/controllers/reconcile_test.go`

這個文件有 **56KB**，包含大量測試用例！

```bash
# 查看測試統計
wc -l internal/controllers/reconcile_test.go
# 約 1500+ 行

# 運行單元測試
make test

# 運行特定包
go test ./internal/controllers -v

# 運行特定測試
go test ./internal/controllers -v -run TestReconcile_Deployment
```

### 8.2.2 編寫單元測試範例

**創建測試文件**: `internal/controllers/my_feature_test.go`

```go
package controllers_test

import (
    "context"
    "testing"

    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/types"
    "sigs.k8s.io/controller-runtime/pkg/client/fake"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"

    "github.com/open-telemetry/opentelemetry-operator/apis/v1beta1"
    "github.com/open-telemetry/opentelemetry-operator/internal/controllers"
)

func TestReconcile_CustomFeature(t *testing.T) {
    // ========== 準備階段 ==========
    ctx := context.Background()

    // 創建測試用的 OpenTelemetryCollector
    otelCol := &v1beta1.OpenTelemetryCollector{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "test-collector",
            Namespace: "default",
        },
        Spec: v1beta1.OpenTelemetryCollectorSpec{
            Mode: v1alpha1.ModeDeployment,
            Config: v1beta1.Config{
                Receivers: v1beta1.AnyConfig{
                    Object: map[string]interface{}{
                        "otlp": map[string]interface{}{
                            "protocols": map[string]interface{}{
                                "grpc": map[string]interface{}{
                                    "endpoint": "0.0.0.0:4317",
                                },
                            },
                        },
                    },
                },
                Exporters: v1beta1.AnyConfig{
                    Object: map[string]interface{}{
                        "debug": map[string]interface{}{},
                    },
                },
                Service: v1beta1.Service{
                    Pipelines: map[string]*v1beta1.Pipeline{
                        "traces": {
                            Receivers: []string{"otlp"},
                            Exporters: []string{"debug"},
                        },
                    },
                },
            },
        },
    }

    // 創建 fake client
    scheme := runtime.NewScheme()
    _ = v1beta1.AddToScheme(scheme)
    _ = appsv1.AddToScheme(scheme)
    _ = corev1.AddToScheme(scheme)

    k8sClient := fake.NewClientBuilder().
        WithScheme(scheme).
        WithObjects(otelCol).
        WithStatusSubresource(&v1beta1.OpenTelemetryCollector{}).
        Build()

    // 創建 Reconciler
    reconciler := &controllers.OpenTelemetryCollectorReconciler{
        Client:   k8sClient,
        Scheme:   scheme,
        log:      logr.Discard(),
        recorder: record.NewFakeRecorder(10),
        config:   config.New(),
    }

    // ========== 執行階段 ==========
    req := reconcile.Request{
        NamespacedName: types.NamespacedName{
            Name:      "test-collector",
            Namespace: "default",
        },
    }

    result, err := reconciler.Reconcile(ctx, req)

    // ========== 驗證階段 ==========
    require.NoError(t, err)
    assert.False(t, result.Requeue)

    // 驗證 Deployment
    deployment := &appsv1.Deployment{}
    err = k8sClient.Get(ctx, types.NamespacedName{
        Name:      "test-collector-collector",
        Namespace: "default",
    }, deployment)
    require.NoError(t, err)

    // 驗證副本數
    assert.Equal(t, int32(1), *deployment.Spec.Replicas)

    // 驗證容器
    assert.Len(t, deployment.Spec.Template.Spec.Containers, 1)
    container := deployment.Spec.Template.Spec.Containers[0]
    assert.Equal(t, "otc-container", container.Name)

    // 驗證端口
    require.Len(t, container.Ports, 1)
    assert.Equal(t, "otlp-grpc", container.Ports[0].Name)
    assert.Equal(t, int32(4317), container.Ports[0].ContainerPort)

    // 驗證 Service
    service := &corev1.Service{}
    err = k8sClient.Get(ctx, types.NamespacedName{
        Name:      "test-collector-collector",
        Namespace: "default",
    }, service)
    require.NoError(t, err)
    assert.Len(t, service.Spec.Ports, 1)

    // 驗證 ConfigMap
    configMap := &corev1.ConfigMap{}
    configMaps := &corev1.ConfigMapList{}
    err = k8sClient.List(ctx, configMaps,
        client.InNamespace("default"),
        client.MatchingLabels{"app.kubernetes.io/instance": "test-collector"},
    )
    require.NoError(t, err)
    assert.Len(t, configMaps.Items, 1)

    // 驗證 Status
    err = k8sClient.Get(ctx, types.NamespacedName{
        Name:      "test-collector",
        Namespace: "default",
    }, otelCol)
    require.NoError(t, err)
    assert.NotEmpty(t, otelCol.Status.Version)
}

// Table-driven 測試範例
func TestReconcile_DifferentModes(t *testing.T) {
    tests := []struct {
        name     string
        mode     v1alpha1.Mode
        replicas *int32
        wantType string
    }{
        {
            name:     "Deployment mode",
            mode:     v1alpha1.ModeDeployment,
            replicas: ptr.To(int32(3)),
            wantType: "Deployment",
        },
        {
            name:     "DaemonSet mode",
            mode:     v1alpha1.ModeDaemonSet,
            replicas: nil,
            wantType: "DaemonSet",
        },
        {
            name:     "StatefulSet mode",
            mode:     v1alpha1.ModeStatefulSet,
            replicas: ptr.To(int32(2)),
            wantType: "StatefulSet",
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // 測試邏輯...
        })
    }
}
```

### 8.2.3 運行測試

```bash
# ========== 運行所有測試 ==========
make test

# ========== 運行特定包 ==========
go test ./internal/controllers -v

# ========== 運行特定測試 ==========
go test ./internal/controllers -v -run TestReconcile_Deployment

# ========== 運行測試並查看覆蓋率 ==========
go test ./... -coverprofile=coverage.out
go tool cover -html=coverage.out

# ========== 並行運行測試 ==========
go test ./... -parallel=4

# ========== 運行測試並顯示詳細輸出 ==========
go test ./internal/controllers -v -count=1

# ========== 測試特定功能 ==========
go test ./internal/manifests/collector -v -run TestDeployment
```

## 8.3 E2E 測試深度實踐

### 8.3.1 E2E 測試架構

OpenTelemetry Operator 使用 **Chainsaw** 進行 E2E 測試。

**測試目錄結構**：

```bash
tests/
├── e2e/                       # 基礎功能
│   ├── smoke/                 # 煙霧測試
│   ├── smoke-targetallocator/ # TA 煙霧測試
│   └── smoke-sidecar/         # Sidecar 測試
├── e2e-instrumentation/       # 自動埋點測試
├── e2e-targetallocator/       # Target Allocator
├── e2e-opampbridge/          # OpAMP Bridge
└── e2e-upgrade/              # 升級測試
```

### 8.3.2 Chainsaw 測試範例

**檔案**: `tests/e2e/smoke/00-install.yaml`

```yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [debug]
```

**檔案**: `tests/e2e/smoke/01-assert.yaml`

```yaml
# 斷言 Deployment 被創建並就緒
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simplest-collector
spec:
  replicas: 1
status:
  readyReplicas: 1
  availableReplicas: 1
---
# 斷言 Service 被創建
apiVersion: v1
kind: Service
metadata:
  name: simplest-collector
spec:
  ports:
    - name: otlp-grpc
      port: 4317
      protocol: TCP
      targetPort: 4317
```

**檔案**: `tests/e2e/smoke/chainsaw-test.yaml`

```yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: smoke
spec:
  steps:
    - name: Install collector
      try:
        - apply:
            file: 00-install.yaml
        - assert:
            file: 01-assert.yaml
      catch:
        - describe:
            apiVersion: opentelemetry.io/v1beta1
            kind: OpenTelemetryCollector
        - describe:
            apiVersion: apps/v1
            kind: Deployment
        - podLogs:
            selector: app.kubernetes.io/name=simplest-collector
```

### 8.3.3 運行 E2E 測試

```bash
# ========== 1. 創建測試集群 ==========
make kind-cluster

# ========== 2. 構建並載入鏡像 ==========
make docker-build
make docker-build-targetallocator
make docker-build-operator-opamp-bridge

# 載入鏡像到 Kind
kind load docker-image \
  ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:latest \
  --name otel-operator

# ========== 3. 部署 Operator ==========
make deploy IMG=ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:latest

# ========== 4. 運行 E2E 測試 ==========
make e2e

# 運行特定測試
make e2e-instrumentation
make e2e-targetallocator
make e2e-upgrade

# ========== 5. 查看測試結果 ==========
# Chainsaw 會輸出詳細的測試報告

# ========== 6. 清理 ==========
make kind-delete-cluster
```

### 8.3.4 編寫自定義 E2E 測試

**創建測試目錄**：

```bash
mkdir -p tests/e2e-custom-feature
cd tests/e2e-custom-feature
```

**創建測試清單** (`00-install.yaml`):

```yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: custom-test
spec:
  mode: deployment
  replicas: 2
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}
    processors:
      batch:
        send_batch_size: 1000
        timeout: 10s
    exporters:
      debug: {}
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug]
```

**創建斷言** (`01-assert.yaml`):

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-test-collector
spec:
  replicas: 2
status:
  readyReplicas: 2
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: (custom-test-collector-*)
data:
  collector.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
```

**創建 Chainsaw 測試** (`chainsaw-test.yaml`):

```yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: custom-feature
spec:
  description: Test custom feature
  timeouts:
    apply: 30s
    assert: 60s
  steps:
    - name: Install collector
      try:
        - apply:
            file: 00-install.yaml
        - assert:
            file: 01-assert.yaml
            timeout: 2m
      catch:
        - describe:
            apiVersion: opentelemetry.io/v1beta1
            kind: OpenTelemetryCollector
        - podLogs:
            selector: app.kubernetes.io/name=custom-test-collector

    - name: Test configuration update
      try:
        - patch:
            resource:
              apiVersion: opentelemetry.io/v1beta1
              kind: OpenTelemetryCollector
              metadata:
                name: custom-test
            merge:
              spec:
                replicas: 3
        - sleep:
            duration: 10s
        - assert:
            resource:
              apiVersion: apps/v1
              kind: Deployment
              metadata:
                name: custom-test-collector
              spec:
                replicas: 3

    - name: Cleanup
      try:
        - delete:
            ref:
              apiVersion: opentelemetry.io/v1beta1
              kind: OpenTelemetryCollector
              name: custom-test
```

**運行自定義測試**：

```bash
# 安裝 Chainsaw
go install github.com/kyverno/chainsaw@latest

# 運行測試
chainsaw test --test-dir ./tests/e2e-custom-feature
```

## 8.4 測試覆蓋率

```bash
# ========== 生成覆蓋率報告 ==========
go test ./... -coverprofile=coverage.out

# ========== 查看覆蓋率摘要 ==========
go tool cover -func=coverage.out

# 輸出範例：
# github.com/open-telemetry/opentelemetry-operator/internal/controllers/opentelemetrycollector_controller.go:234:	Reconcile		85.7%
# github.com/open-telemetry/opentelemetry-operator/internal/manifests/collector/deployment.go:17:		Deployment		92.3%

# ========== 生成 HTML 報告 ==========
go tool cover -html=coverage.out -o coverage.html

# 在瀏覽器中打開
open coverage.html  # macOS
xdg-open coverage.html  # Linux

# ========== 按包查看覆蓋率 ==========
go test ./internal/controllers -coverprofile=controllers.out
go tool cover -func=controllers.out
```

## 8.5 性能測試與基準測試

### 8.5.1 Benchmark 測試

**創建**: `internal/controllers/reconcile_bench_test.go`

```go
package controllers_test

import (
    "context"
    "testing"

    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/types"
    "sigs.k8s.io/controller-runtime/pkg/client/fake"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"

    "github.com/open-telemetry/opentelemetry-operator/apis/v1beta1"
)

func BenchmarkReconcile(b *testing.B) {
    // 準備測試數據
    otelCol := &v1beta1.OpenTelemetryCollector{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "benchmark-test",
            Namespace: "default",
        },
        Spec: v1beta1.OpenTelemetryCollectorSpec{
            Mode: v1alpha1.ModeDeployment,
            Config: v1beta1.Config{
                // ... 配置
            },
        },
    }

    k8sClient := fake.NewClientBuilder().
        WithObjects(otelCol).
        Build()

    reconciler := &controllers.OpenTelemetryCollectorReconciler{
        Client: k8sClient,
        // ... 其他字段
    }

    req := reconcile.Request{
        NamespacedName: types.NamespacedName{
            Name:      "benchmark-test",
            Namespace: "default",
        },
    }

    ctx := context.Background()

    // 運行 benchmark
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := reconciler.Reconcile(ctx, req)
        if err != nil {
            b.Fatal(err)
        }
    }
}
```

**運行 Benchmark**：

```bash
# 運行 benchmark
go test -bench=. ./internal/controllers

# 輸出範例：
# BenchmarkReconcile-8   	    1000	   1234567 ns/op

# 帶內存分析
go test -bench=. -benchmem ./internal/controllers

# 輸出範例：
# BenchmarkReconcile-8   	    1000	   1234567 ns/op	  123456 B/op	    1234 allocs/op

# 生成 CPU profile
go test -bench=. -cpuprofile=cpu.prof ./internal/controllers

# 分析 profile
go tool pprof cpu.prof
```

---

# 九、實戰專案：Nginx Operator

在本章中，我們將從零開始實作一個完整的 Nginx Operator，應用前面學到的所有概念。

## 9.1 專案目標與架構

### 9.1.1 功能需求

我們的 Nginx Operator 需要支援：

1. **自動部署 Nginx**：根據 CR 創建 Deployment
    
2. **配置管理**：支援自定義 nginx.conf
    
3. **服務暴露**：自動創建 Service
    
4. **配置熱更新**：ConfigMap 變更後自動觸發滾動更新
    
5. **版本管理**：支援 Nginx 版本升級
    
6. **健康檢查**：配置 liveness 和 readiness probe
    

### 9.1.2 CRD 設計

**API 定義** (`apis/v1alpha1/nginx_types.go`)：

```go
package v1alpha1

import (
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// NginxSpec defines the desired state of Nginx
type NginxSpec struct {
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=10
    // +kubebuilder:default=1
    Replicas *int32 `json:"replicas,omitempty"`

    // +kubebuilder:validation:Pattern=`^nginx:.*$`
    // +kubebuilder:default="nginx:1.25"
    Image string `json:"image,omitempty"`

    // +optional
    Resources corev1.ResourceRequirements `json:"resources,omitempty"`

    // +optional
    Config string `json:"config,omitempty"`

    // +optional
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=65535
    // +kubebuilder:default=80
    Port int32 `json:"port,omitempty"`

    // +optional
    // +kubebuilder:validation:Enum=ClusterIP;NodePort;LoadBalancer
    // +kubebuilder:default=ClusterIP
    ServiceType corev1.ServiceType `json:"serviceType,omitempty"`
}

// NginxStatus defines the observed state of Nginx
type NginxStatus struct {
    // Conditions represent the latest available observations of the Nginx state
    // +optional
    Conditions []metav1.Condition `json:"conditions,omitempty"`

    // ReadyReplicas is the number of ready replicas
    // +optional
    ReadyReplicas int32 `json:"readyReplicas,omitempty"`

    // ConfigVersion tracks the version of the ConfigMap
    // +optional
    ConfigVersion string `json:"configVersion,omitempty"`

    // LastUpdateTime is the last time the status was updated
    // +optional
    LastUpdateTime *metav1.Time `json:"lastUpdateTime,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.readyReplicas
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="Image",type=string,JSONPath=`.spec.image`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// Nginx is the Schema for the nginxes API
type Nginx struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   NginxSpec   `json:"spec,omitempty"`
    Status NginxStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// NginxList contains a list of Nginx
type NginxList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []Nginx `json:"items"`
}

func init() {
    SchemeBuilder.Register(&Nginx{}, &NginxList{})
}
```

**關鍵註解說明**：

* `+kubebuilder:subresource:status`：啟用 status subresource
    
* `+kubebuilder:subresource:scale`：支援 `kubectl scale` 命令
    
* `+kubebuilder:printcolumn`：自定義 `kubectl get` 輸出列
    
* `+kubebuilder:validation`：欄位驗證規則
    

## 9.2 Controller 實作

### 9.2.1 Reconciler 主邏輯

**Controller** (`internal/controller/nginx_controller.go`)：

```go
package controller

import (
    "context"
    "crypto/sha256"
    "fmt"
    "time"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/meta"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    "sigs.k8s.io/controller-runtime/pkg/log"

    webappv1alpha1 "example.com/nginx-operator/apis/v1alpha1"
)

const (
    nginxFinalizer = "nginx.webapp.example.com/finalizer"

    // Condition types
    TypeAvailable   = "Available"
    TypeProgressing = "Progressing"
    TypeDegraded    = "Degraded"
)

// NginxReconciler reconciles a Nginx object
type NginxReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=webapp.example.com,resources=nginxes,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=webapp.example.com,resources=nginxes/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=webapp.example.com,resources=nginxes/finalizers,verbs=update
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services;configmaps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch

func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)
    logger.Info("Reconciling Nginx", "namespace", req.Namespace, "name", req.Name)

    // 1. 獲取 Nginx 資源
    nginx := &webappv1alpha1.Nginx{}
    if err := r.Get(ctx, req.NamespacedName, nginx); err != nil {
        if apierrors.IsNotFound(err) {
            logger.Info("Nginx resource not found, likely deleted")
            return ctrl.Result{}, nil
        }
        logger.Error(err, "Failed to get Nginx")
        return ctrl.Result{}, err
    }

    // 2. 處理刪除邏輯 (Finalizer)
    if !nginx.ObjectMeta.DeletionTimestamp.IsZero() {
        return r.reconcileDelete(ctx, nginx)
    }

    // 3. 添加 Finalizer
    if !controllerutil.ContainsFinalizer(nginx, nginxFinalizer) {
        controllerutil.AddFinalizer(nginx, nginxFinalizer)
        if err := r.Update(ctx, nginx); err != nil {
            logger.Error(err, "Failed to add finalizer")
            return ctrl.Result{}, err
        }
        return ctrl.Result{Requeue: true}, nil
    }

    // 4. Reconcile ConfigMap
    configMap, configVersion, err := r.reconcileConfigMap(ctx, nginx)
    if err != nil {
        r.setCondition(nginx, TypeDegraded, metav1.ConditionTrue, "ConfigMapFailed", err.Error())
        _ = r.Status().Update(ctx, nginx)
        return ctrl.Result{}, err
    }

    // 5. Reconcile Deployment
    if err := r.reconcileDeployment(ctx, nginx, configVersion); err != nil {
        r.setCondition(nginx, TypeDegraded, metav1.ConditionTrue, "DeploymentFailed", err.Error())
        _ = r.Status().Update(ctx, nginx)
        return ctrl.Result{}, err
    }

    // 6. Reconcile Service
    if err := r.reconcileService(ctx, nginx); err != nil {
        r.setCondition(nginx, TypeDegraded, metav1.ConditionTrue, "ServiceFailed", err.Error())
        _ = r.Status().Update(ctx, nginx)
        return ctrl.Result{}, err
    }

    // 7. 更新 Status
    if err := r.updateStatus(ctx, nginx, configVersion); err != nil {
        logger.Error(err, "Failed to update status")
        return ctrl.Result{}, err
    }

    // 8. 設置成功 Condition
    r.setCondition(nginx, TypeAvailable, metav1.ConditionTrue, "ReconcileSuccess", "Nginx is available")
    r.setCondition(nginx, TypeDegraded, metav1.ConditionFalse, "ReconcileSuccess", "")
    if err := r.Status().Update(ctx, nginx); err != nil {
        return ctrl.Result{}, err
    }

    logger.Info("Successfully reconciled Nginx")
    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

// reconcileConfigMap 管理 ConfigMap
func (r *NginxReconciler) reconcileConfigMap(ctx context.Context, nginx *webappv1alpha1.Nginx) (*corev1.ConfigMap, string, error) {
    logger := log.FromContext(ctx)

    // 默認 nginx 配置
    defaultConfig := `
events {
    worker_connections 1024;
}

http {
    server {
        listen 80;
        location / {
            root /usr/share/nginx/html;
            index index.html;
        }
    }
}`

    config := nginx.Spec.Config
    if config == "" {
        config = defaultConfig
    }

    // 計算配置的版本（hash）
    hash := sha256.Sum256([]byte(config))
    configVersion := fmt.Sprintf("%x", hash[:8])

    configMap := &corev1.ConfigMap{
        ObjectMeta: metav1.ObjectMeta{
            Name:      nginx.Name + "-config",
            Namespace: nginx.Namespace,
            Labels: map[string]string{
                "app":     "nginx",
                "nginx":   nginx.Name,
                "version": configVersion,
            },
        },
        Data: map[string]string{
            "nginx.conf": config,
        },
    }

    // 設置 Owner Reference
    if err := controllerutil.SetControllerReference(nginx, configMap, r.Scheme); err != nil {
        return nil, "", err
    }

    // 創建或更新 ConfigMap
    existing := &corev1.ConfigMap{}
    err := r.Get(ctx, types.NamespacedName{
        Name:      configMap.Name,
        Namespace: configMap.Namespace,
    }, existing)

    if err != nil {
        if apierrors.IsNotFound(err) {
            logger.Info("Creating ConfigMap", "name", configMap.Name)
            if err := r.Create(ctx, configMap); err != nil {
                return nil, "", err
            }
            return configMap, configVersion, nil
        }
        return nil, "", err
    }

    // 更新 ConfigMap
    existing.Data = configMap.Data
    existing.Labels = configMap.Labels
    logger.Info("Updating ConfigMap", "name", configMap.Name)
    if err := r.Update(ctx, existing); err != nil {
        return nil, "", err
    }

    return existing, configVersion, nil
}

// reconcileDeployment 管理 Deployment
func (r *NginxReconciler) reconcileDeployment(ctx context.Context, nginx *webappv1alpha1.Nginx, configVersion string) error {
    logger := log.FromContext(ctx)

    replicas := int32(1)
    if nginx.Spec.Replicas != nil {
        replicas = *nginx.Spec.Replicas
    }

    image := "nginx:1.25"
    if nginx.Spec.Image != "" {
        image = nginx.Spec.Image
    }

    port := int32(80)
    if nginx.Spec.Port != 0 {
        port = nginx.Spec.Port
    }

    deployment := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      nginx.Name,
            Namespace: nginx.Namespace,
            Labels: map[string]string{
                "app":   "nginx",
                "nginx": nginx.Name,
            },
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "app":   "nginx",
                    "nginx": nginx.Name,
                },
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{
                        "app":           "nginx",
                        "nginx":         nginx.Name,
                        "config-version": configVersion,  // 配置版本變更時觸發滾動更新
                    },
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "nginx",
                            Image: image,
                            Ports: []corev1.ContainerPort{
                                {
                                    Name:          "http",
                                    ContainerPort: port,
                                    Protocol:      corev1.ProtocolTCP,
                                },
                            },
                            VolumeMounts: []corev1.VolumeMount{
                                {
                                    Name:      "config",
                                    MountPath: "/etc/nginx/nginx.conf",
                                    SubPath:   "nginx.conf",
                                },
                            },
                            Resources: nginx.Spec.Resources,
                            LivenessProbe: &corev1.Probe{
                                ProbeHandler: corev1.ProbeHandler{
                                    HTTPGet: &corev1.HTTPGetAction{
                                        Path: "/",
                                        Port: intstr.FromInt(int(port)),
                                    },
                                },
                                InitialDelaySeconds: 10,
                                PeriodSeconds:       10,
                            },
                            ReadinessProbe: &corev1.Probe{
                                ProbeHandler: corev1.ProbeHandler{
                                    HTTPGet: &corev1.HTTPGetAction{
                                        Path: "/",
                                        Port: intstr.FromInt(int(port)),
                                    },
                                },
                                InitialDelaySeconds: 5,
                                PeriodSeconds:       5,
                            },
                        },
                    },
                    Volumes: []corev1.Volume{
                        {
                            Name: "config",
                            VolumeSource: corev1.VolumeSource{
                                ConfigMap: &corev1.ConfigMapVolumeSource{
                                    LocalObjectReference: corev1.LocalObjectReference{
                                        Name: nginx.Name + "-config",
                                    },
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    // 設置 Owner Reference
    if err := controllerutil.SetControllerReference(nginx, deployment, r.Scheme); err != nil {
        return err
    }

    // 創建或更新 Deployment
    existing := &appsv1.Deployment{}
    err := r.Get(ctx, types.NamespacedName{
        Name:      deployment.Name,
        Namespace: deployment.Namespace,
    }, existing)

    if err != nil {
        if apierrors.IsNotFound(err) {
            logger.Info("Creating Deployment", "name", deployment.Name)
            return r.Create(ctx, deployment)
        }
        return err
    }

    // 更新 Deployment
    existing.Spec = deployment.Spec
    logger.Info("Updating Deployment", "name", deployment.Name)
    return r.Update(ctx, existing)
}

// reconcileService 管理 Service
func (r *NginxReconciler) reconcileService(ctx context.Context, nginx *webappv1alpha1.Nginx) error {
    logger := log.FromContext(ctx)

    port := int32(80)
    if nginx.Spec.Port != 0 {
        port = nginx.Spec.Port
    }

    serviceType := corev1.ServiceTypeClusterIP
    if nginx.Spec.ServiceType != "" {
        serviceType = nginx.Spec.ServiceType
    }

    service := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      nginx.Name,
            Namespace: nginx.Namespace,
            Labels: map[string]string{
                "app":   "nginx",
                "nginx": nginx.Name,
            },
        },
        Spec: corev1.ServiceSpec{
            Type: serviceType,
            Selector: map[string]string{
                "app":   "nginx",
                "nginx": nginx.Name,
            },
            Ports: []corev1.ServicePort{
                {
                    Name:       "http",
                    Protocol:   corev1.ProtocolTCP,
                    Port:       port,
                    TargetPort: intstr.FromInt(int(port)),
                },
            },
        },
    }

    // 設置 Owner Reference
    if err := controllerutil.SetControllerReference(nginx, service, r.Scheme); err != nil {
        return err
    }

    // 創建或更新 Service
    existing := &corev1.Service{}
    err := r.Get(ctx, types.NamespacedName{
        Name:      service.Name,
        Namespace: service.Namespace,
    }, existing)

    if err != nil {
        if apierrors.IsNotFound(err) {
            logger.Info("Creating Service", "name", service.Name)
            return r.Create(ctx, service)
        }
        return err
    }

    // 更新 Service（保留 ClusterIP）
    existing.Spec.Ports = service.Spec.Ports
    existing.Spec.Selector = service.Spec.Selector
    existing.Spec.Type = service.Spec.Type
    logger.Info("Updating Service", "name", service.Name)
    return r.Update(ctx, existing)
}

// updateStatus 更新 Nginx 狀態
func (r *NginxReconciler) updateStatus(ctx context.Context, nginx *webappv1alpha1.Nginx, configVersion string) error {
    // 獲取 Deployment 狀態
    deployment := &appsv1.Deployment{}
    err := r.Get(ctx, types.NamespacedName{
        Name:      nginx.Name,
        Namespace: nginx.Namespace,
    }, deployment)
    if err != nil {
        return err
    }

    // 更新狀態
    nginx.Status.ReadyReplicas = deployment.Status.ReadyReplicas
    nginx.Status.ConfigVersion = configVersion
    now := metav1.Now()
    nginx.Status.LastUpdateTime = &now

    // 設置 Progressing condition
    if deployment.Status.UpdatedReplicas < *deployment.Spec.Replicas {
        r.setCondition(nginx, TypeProgressing, metav1.ConditionTrue, "Updating", "Deployment is being updated")
    } else {
        r.setCondition(nginx, TypeProgressing, metav1.ConditionFalse, "Updated", "All replicas are updated")
    }

    return r.Status().Update(ctx, nginx)
}

// reconcileDelete 處理刪除邏輯
func (r *NginxReconciler) reconcileDelete(ctx context.Context, nginx *webappv1alpha1.Nginx) (ctrl.Result, error) {
    logger := log.FromContext(ctx)
    logger.Info("Deleting Nginx", "name", nginx.Name)

    if controllerutil.ContainsFinalizer(nginx, nginxFinalizer) {
        // 執行清理邏輯（例如清理外部資源）
        logger.Info("Performing cleanup for Nginx", "name", nginx.Name)

        // 移除 finalizer
        controllerutil.RemoveFinalizer(nginx, nginxFinalizer)
        if err := r.Update(ctx, nginx); err != nil {
            return ctrl.Result{}, err
        }
    }

    return ctrl.Result{}, nil
}

// setCondition 設置 Condition
func (r *NginxReconciler) setCondition(nginx *webappv1alpha1.Nginx, condType string, status metav1.ConditionStatus, reason, message string) {
    condition := metav1.Condition{
        Type:               condType,
        Status:             status,
        Reason:             reason,
        Message:            message,
        LastTransitionTime: metav1.Now(),
        ObservedGeneration: nginx.Generation,
    }
    meta.SetStatusCondition(&nginx.Status.Conditions, condition)
}

// SetupWithManager sets up the controller with the Manager.
func (r *NginxReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webappv1alpha1.Nginx{}).
        Owns(&appsv1.Deployment{}).
        Owns(&corev1.Service{}).
        Owns(&corev1.ConfigMap{}).
        Complete(r)
}
```

### 9.2.2 核心概念詳解

**1\. ConfigMap 版本管理**

```go
// 計算配置的 SHA256 hash 作為版本號
hash := sha256.Sum256([]byte(config))
configVersion := fmt.Sprintf("%x", hash[:8])

// 將版本號添加到 Pod labels
Labels: map[string]string{
    "config-version": configVersion,  // 版本變更時自動觸發 rolling update
}
```

**2\. Owner Reference 自動清理**

```go
// 設置 Owner Reference 後，刪除 Nginx 時會自動刪除子資源
if err := controllerutil.SetControllerReference(nginx, deployment, r.Scheme); err != nil {
    return err
}
```

**3\. Finalizer 清理邏輯**

```go
// 添加 finalizer 防止資源被立即刪除
if !controllerutil.ContainsFinalizer(nginx, nginxFinalizer) {
    controllerutil.AddFinalizer(nginx, nginxFinalizer)
    if err := r.Update(ctx, nginx); err != nil {
        return ctrl.Result{}, err
    }
}

// 刪除時執行清理
if !nginx.ObjectMeta.DeletionTimestamp.IsZero() {
    // 執行清理邏輯
    // ...

    // 移除 finalizer 允許刪除
    controllerutil.RemoveFinalizer(nginx, nginxFinalizer)
    if err := r.Update(ctx, nginx); err != nil {
        return ctrl.Result{}, err
    }
}
```

## 9.3 使用範例

### 9.3.1 部署 Operator

```bash
# 1. 生成 CRD 和 RBAC manifests
make manifests

# 2. 安裝 CRD
make install

# 3. 運行 operator（開發模式）
make run

# 或者部署到集群
make docker-build docker-push IMG=your-registry/nginx-operator:v1.0.0
make deploy IMG=your-registry/nginx-operator:v1.0.0
```

### 9.3.2 創建 Nginx 實例

**基本範例** (`config/samples/nginx_basic.yaml`)：

```yaml
apiVersion: webapp.example.com/v1alpha1
kind: Nginx
metadata:
  name: nginx-sample
  namespace: default
spec:
  replicas: 3
  image: nginx:1.25
  port: 80
  serviceType: ClusterIP
```

**自定義配置範例** (`config/samples/nginx_custom.yaml`)：

```yaml
apiVersion: webapp.example.com/v1alpha1
kind: Nginx
metadata:
  name: nginx-custom
  namespace: default
spec:
  replicas: 2
  image: nginx:1.25
  port: 8080
  serviceType: LoadBalancer
  resources:
    requests:
      memory: "128Mi"
      cpu: "100m"
    limits:
      memory: "256Mi"
      cpu: "200m"
  config: |
    events {
        worker_connections 2048;
    }

    http {
        log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for"';

        access_log /var/log/nginx/access.log main;

        upstream backend {
            server backend-1.example.com;
            server backend-2.example.com;
        }

        server {
            listen 8080;

            location / {
                proxy_pass http://backend;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
            }

            location /health {
                access_log off;
                return 200 "healthy\n";
            }
        }
    }
```

### 9.3.3 測試和驗證

```bash
# 1. 創建 Nginx 實例
kubectl apply -f config/samples/nginx_basic.yaml

# 2. 查看 Nginx 狀態
kubectl get nginx nginx-sample

# 輸出：
# NAME           REPLICAS   READY   IMAGE         AGE
# nginx-sample   3          3       nginx:1.25    2m

# 3. 查看詳細狀態
kubectl describe nginx nginx-sample

# 4. 查看生成的資源
kubectl get deployment,svc,cm -l nginx=nginx-sample

# 5. 測試服務
kubectl port-forward svc/nginx-sample 8080:80
curl http://localhost:8080

# 6. 更新配置（觸發滾動更新）
kubectl edit nginx nginx-sample
# 修改 spec.config 或 spec.replicas

# 7. 觀察滾動更新
kubectl rollout status deployment/nginx-sample

# 8. 查看 Conditions
kubectl get nginx nginx-sample -o jsonpath='{.status.conditions}' | jq

# 9. Scale 測試
kubectl scale nginx nginx-sample --replicas=5
kubectl get nginx nginx-sample

# 10. 刪除測試
kubectl delete nginx nginx-sample
# 驗證子資源被自動清理
kubectl get deployment,svc,cm -l nginx=nginx-sample
```

## 9.4 進階功能實作

### 9.4.1 支援 Ingress 自動創建

**擴展 CRD**：

```go
type NginxSpec struct {
    // ... 現有字段

    // +optional
    Ingress *IngressSpec `json:"ingress,omitempty"`
}

type IngressSpec struct {
    // +kubebuilder:validation:Required
    Host string `json:"host"`

    // +optional
    TLS bool `json:"tls,omitempty"`

    // +optional
    Annotations map[string]string `json:"annotations,omitempty"`
}
```

**Controller 添加 Ingress reconcile**：

```go
func (r *NginxReconciler) reconcileIngress(ctx context.Context, nginx *webappv1alpha1.Nginx) error {
    if nginx.Spec.Ingress == nil {
        return nil
    }

    pathType := networkingv1.PathTypePrefix
    ingress := &networkingv1.Ingress{
        ObjectMeta: metav1.ObjectMeta{
            Name:        nginx.Name,
            Namespace:   nginx.Namespace,
            Annotations: nginx.Spec.Ingress.Annotations,
        },
        Spec: networkingv1.IngressSpec{
            Rules: []networkingv1.IngressRule{
                {
                    Host: nginx.Spec.Ingress.Host,
                    IngressRuleValue: networkingv1.IngressRuleValue{
                        HTTP: &networkingv1.HTTPIngressRuleValue{
                            Paths: []networkingv1.HTTPIngressPath{
                                {
                                    Path:     "/",
                                    PathType: &pathType,
                                    Backend: networkingv1.IngressBackend{
                                        Service: &networkingv1.IngressServiceBackend{
                                            Name: nginx.Name,
                                            Port: networkingv1.ServiceBackendPort{
                                                Number: nginx.Spec.Port,
                                            },
                                        },
                                    },
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    if nginx.Spec.Ingress.TLS {
        ingress.Spec.TLS = []networkingv1.IngressTLS{
            {
                Hosts:      []string{nginx.Spec.Ingress.Host},
                SecretName: nginx.Name + "-tls",
            },
        }
    }

    if err := controllerutil.SetControllerReference(nginx, ingress, r.Scheme); err != nil {
        return err
    }

    // 創建或更新邏輯...
    return nil
}
```

### 9.4.2 支援多端口暴露

**擴展 CRD**：

```go
type PortSpec struct {
    // +kubebuilder:validation:Required
    Name string `json:"name"`

    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=65535
    Port int32 `json:"port"`

    // +optional
    // +kubebuilder:default="TCP"
    Protocol corev1.Protocol `json:"protocol,omitempty"`
}

type NginxSpec struct {
    // ... 現有字段

    // +optional
    AdditionalPorts []PortSpec `json:"additionalPorts,omitempty"`
}
```

**Controller 處理多端口**：

```go
func (r *NginxReconciler) buildServicePorts(nginx *webappv1alpha1.Nginx) []corev1.ServicePort {
    ports := []corev1.ServicePort{
        {
            Name:       "http",
            Port:       nginx.Spec.Port,
            TargetPort: intstr.FromInt(int(nginx.Spec.Port)),
            Protocol:   corev1.ProtocolTCP,
        },
    }

    for _, p := range nginx.Spec.AdditionalPorts {
        ports = append(ports, corev1.ServicePort{
            Name:       p.Name,
            Port:       p.Port,
            TargetPort: intstr.FromInt(int(p.Port)),
            Protocol:   p.Protocol,
        })
    }

    return ports
}
```

## 9.5 完整測試範例

### 9.5.1 單元測試

**Controller 測試** (`internal/controller/nginx_controller_test.go`)：

```go
package controller

import (
    "context"
    "time"

    . "github.com/onsi/ginkgo/v2"
    . "github.com/onsi/gomega"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/types"

    webappv1alpha1 "example.com/nginx-operator/apis/v1alpha1"
)

var _ = Describe("Nginx Controller", func() {
    const (
        NginxName      = "test-nginx"
        NginxNamespace = "default"
        timeout        = time.Second * 10
        interval       = time.Millisecond * 250
    )

    Context("When creating a Nginx resource", func() {
        It("Should create Deployment, Service, and ConfigMap", func() {
            ctx := context.Background()

            nginx := &webappv1alpha1.Nginx{
                ObjectMeta: metav1.ObjectMeta{
                    Name:      NginxName,
                    Namespace: NginxNamespace,
                },
                Spec: webappv1alpha1.NginxSpec{
                    Replicas: pointer.Int32(2),
                    Image:    "nginx:1.25",
                    Port:     80,
                },
            }

            Expect(k8sClient.Create(ctx, nginx)).Should(Succeed())

            // 驗證 Deployment 被創建
            deployment := &appsv1.Deployment{}
            Eventually(func() bool {
                err := k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName,
                    Namespace: NginxNamespace,
                }, deployment)
                return err == nil
            }, timeout, interval).Should(BeTrue())

            Expect(*deployment.Spec.Replicas).Should(Equal(int32(2)))
            Expect(deployment.Spec.Template.Spec.Containers[0].Image).Should(Equal("nginx:1.25"))

            // 驗證 Service 被創建
            service := &corev1.Service{}
            Eventually(func() bool {
                err := k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName,
                    Namespace: NginxNamespace,
                }, service)
                return err == nil
            }, timeout, interval).Should(BeTrue())

            Expect(service.Spec.Ports[0].Port).Should(Equal(int32(80)))

            // 驗證 ConfigMap 被創建
            configMap := &corev1.ConfigMap{}
            Eventually(func() bool {
                err := k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName + "-config",
                    Namespace: NginxNamespace,
                }, configMap)
                return err == nil
            }, timeout, interval).Should(BeTrue())

            Expect(configMap.Data).Should(HaveKey("nginx.conf"))
        })
    })

    Context("When updating Nginx config", func() {
        It("Should trigger rolling update", func() {
            ctx := context.Background()

            // 獲取當前 Nginx
            nginx := &webappv1alpha1.Nginx{}
            Expect(k8sClient.Get(ctx, types.NamespacedName{
                Name:      NginxName,
                Namespace: NginxNamespace,
            }, nginx)).Should(Succeed())

            // 更新配置
            nginx.Spec.Config = "# updated config"
            Expect(k8sClient.Update(ctx, nginx)).Should(Succeed())

            // 驗證 ConfigMap 被更新
            configMap := &corev1.ConfigMap{}
            Eventually(func() string {
                k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName + "-config",
                    Namespace: NginxNamespace,
                }, configMap)
                return configMap.Data["nginx.conf"]
            }, timeout, interval).Should(ContainSubstring("updated config"))

            // 驗證 Deployment Pod template 的 config-version label 改變
            deployment := &appsv1.Deployment{}
            oldVersion := ""
            Eventually(func() bool {
                k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName,
                    Namespace: NginxNamespace,
                }, deployment)
                newVersion := deployment.Spec.Template.Labels["config-version"]
                if oldVersion == "" {
                    oldVersion = newVersion
                    return false
                }
                return newVersion != oldVersion
            }, timeout, interval).Should(BeTrue())
        })
    })

    Context("When deleting Nginx", func() {
        It("Should cleanup all resources", func() {
            ctx := context.Background()

            nginx := &webappv1alpha1.Nginx{}
            Expect(k8sClient.Get(ctx, types.NamespacedName{
                Name:      NginxName,
                Namespace: NginxNamespace,
            }, nginx)).Should(Succeed())

            Expect(k8sClient.Delete(ctx, nginx)).Should(Succeed())

            // 驗證所有子資源被刪除
            Eventually(func() bool {
                deployment := &appsv1.Deployment{}
                err := k8sClient.Get(ctx, types.NamespacedName{
                    Name:      NginxName,
                    Namespace: NginxNamespace,
                }, deployment)
                return errors.IsNotFound(err)
            }, timeout, interval).Should(BeTrue())
        })
    })
})
```

### 9.5.2 E2E 測試（Chainsaw）

**測試套件** (`tests/e2e/nginx/chainsaw-test.yaml`)：

```yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: nginx-e2e
spec:
  timeouts:
    apply: 30s
    assert: 1m
    cleanup: 30s
  steps:
    - name: Create Nginx instance
      try:
        - apply:
            file: 00-nginx-sample.yaml
        - assert:
            file: 01-assert-nginx-ready.yaml

    - name: Test service connectivity
      try:
        - script:
            content: |
              kubectl run curl-test --image=curlimages/curl:latest --rm -i --restart=Never -- \
                curl -s http://nginx-sample.default.svc.cluster.local
            check:
              ($error == null): true

    - name: Update configuration
      try:
        - apply:
            file: 02-update-config.yaml
        - assert:
            file: 03-assert-rolling-update.yaml

    - name: Scale up
      try:
        - script:
            content: |
              kubectl scale nginx nginx-sample --replicas=5
        - assert:
            file: 04-assert-scaled.yaml

    - name: Cleanup
      try:
        - delete:
            ref:
              apiVersion: webapp.example.com/v1alpha1
              kind: Nginx
              name: nginx-sample
        - assert:
            file: 05-assert-deleted.yaml
```

**00-nginx-sample.yaml**：

```yaml
apiVersion: webapp.example.com/v1alpha1
kind: Nginx
metadata:
  name: nginx-sample
  namespace: default
spec:
  replicas: 3
  image: nginx:1.25
  port: 80
  serviceType: ClusterIP
```

**01-assert-nginx-ready.yaml**：

```yaml
apiVersion: webapp.example.com/v1alpha1
kind: Nginx
metadata:
  name: nginx-sample
  namespace: default
status:
  readyReplicas: 3
  conditions:
    - type: Available
      status: "True"
      reason: ReconcileSuccess
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-sample
  namespace: default
spec:
  replicas: 3
status:
  readyReplicas: 3
  updatedReplicas: 3
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-sample
  namespace: default
spec:
  type: ClusterIP
  ports:
    - port: 80
```

## 9.6 部署到生產環境

### 9.6.1 Kustomize 配置

**Base** (`config/default/kustomization.yaml`)：

```yaml
namePrefix: nginx-operator-
namespace: nginx-operator-system

resources:
  - ../crd
  - ../rbac
  - ../manager

images:
  - name: controller
    newName: your-registry/nginx-operator
    newTag: v1.0.0
```

**Overlay for Production** (`config/overlays/production/kustomization.yaml`)：

```yaml
bases:
  - ../../default

patchesStrategicMerge:
  - manager_resources.yaml

configMapGenerator:
  - name: manager-config
    literals:
      - LOG_LEVEL=info
      - ENABLE_WEBHOOKS=true
```

**manager\_resources.yaml**：

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: controller-manager
  namespace: system
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: manager
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
```

### 9.6.2 部署命令

```bash
# 構建鏡像
make docker-build IMG=your-registry/nginx-operator:v1.0.0

# 推送鏡像
make docker-push IMG=your-registry/nginx-operator:v1.0.0

# 部署到生產環境
kubectl apply -k config/overlays/production

# 驗證部署
kubectl get deployment -n nginx-operator-system
kubectl get pods -n nginx-operator-system

# 查看日誌
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager -f
```

---

# 十、進階主題與最佳實踐

## 10.1 性能優化

### 10.1.1 使用 Field Indexer 加速查詢

當需要頻繁根據特定字段查詢資源時，Field Indexer 可以大幅提升性能。

**範例：根據 Owner 快速查詢 Pod**

參考 OpenTelemetry Operator 的實作 (`main.go:126-139`)：

```go
// 為 Pod 添加 owner.name 索引
if err := mgr.GetFieldIndexer().IndexField(
    context.Background(),
    &corev1.Pod{},
    "metadata.ownerReferences.name",
    func(rawObj client.Object) []string {
        pod := rawObj.(*corev1.Pod)
        var owners []string
        for _, ref := range pod.GetOwnerReferences() {
            owners = append(owners, ref.Name)
        }
        return owners
    },
); err != nil {
    setupLog.Error(err, "failed to create pod index")
    os.Exit(1)
}

// 使用索引快速查詢
pods := &corev1.PodList{}
err := r.List(ctx, pods, client.MatchingFields{
    "metadata.ownerReferences.name": collectorName,
})
```

**更多索引範例**：

```go
// 1. 索引 ConfigMap 的標籤
mgr.GetFieldIndexer().IndexField(
    context.Background(),
    &corev1.ConfigMap{},
    "metadata.labels.app",
    func(obj client.Object) []string {
        cm := obj.(*corev1.ConfigMap)
        if app, ok := cm.Labels["app"]; ok {
            return []string{app}
        }
        return nil
    },
)

// 查詢
configMaps := &corev1.ConfigMapList{}
r.List(ctx, configMaps, client.MatchingFields{
    "metadata.labels.app": "nginx",
})

// 2. 索引 Pod 的 Node
mgr.GetFieldIndexer().IndexField(
    context.Background(),
    &corev1.Pod{},
    "spec.nodeName",
    func(obj client.Object) []string {
        pod := obj.(*corev1.Pod)
        if pod.Spec.NodeName != "" {
            return []string{pod.Spec.NodeName}
        }
        return nil
    },
)

// 查詢特定 Node 上的 Pod
pods := &corev1.PodList{}
r.List(ctx, pods, client.MatchingFields{
    "spec.nodeName": "node-1",
})
```

### 10.1.2 使用 Cache 減少 API 調用

Controller Runtime 自動提供 cache，但需要正確配置：

```go
// main.go - 配置 Cache
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
    Scheme: scheme,
    Cache: cache.Options{
        // 限制 cache 的 namespace（多租戶場景）
        Namespaces: []string{"namespace1", "namespace2"},

        // 配置同步週期
        SyncPeriod: pointer.Duration(10 * time.Minute),
    },
    // 配置 metrics 和 health 端點
    Metrics: server.Options{
        BindAddress: ":8080",
    },
    HealthProbeBindAddress: ":8081",
})

// Controller 中使用 cache（自動）
func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // r.Get 自動從 cache 讀取
    nginx := &v1alpha1.Nginx{}
    if err := r.Get(ctx, req.NamespacedName, nginx); err != nil {
        return ctrl.Result{}, err
    }

    // r.List 也從 cache 讀取
    deployments := &appsv1.DeploymentList{}
    if err := r.List(ctx, deployments, client.InNamespace(req.Namespace)); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}
```

### 10.1.3 優化 Reconcile 邏輯

**1\. 使用 Predicate 過濾不必要的事件**

```go
import (
    "sigs.k8s.io/controller-runtime/pkg/predicate"
)

func (r *NginxReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webappv1alpha1.Nginx{}).
        Owns(&appsv1.Deployment{}).
        Owns(&corev1.Service{}).
        WithEventFilter(predicate.Funcs{
            // 忽略 status-only 更新
            UpdateFunc: func(e event.UpdateEvent) bool {
                oldObj := e.ObjectOld.(*webappv1alpha1.Nginx)
                newObj := e.ObjectNew.(*webappv1alpha1.Nginx)

                // 只在 spec 或 metadata 改變時 reconcile
                return oldObj.Generation != newObj.Generation ||
                       !reflect.DeepEqual(oldObj.Labels, newObj.Labels) ||
                       !reflect.DeepEqual(oldObj.Annotations, newObj.Annotations)
            },
            // 忽略刪除事件（由 finalizer 處理）
            DeleteFunc: func(e event.DeleteEvent) bool {
                return false
            },
        }).
        Complete(r)
}
```

**2\. 批量處理更新**

```go
func (r *NginxReconciler) reconcileMultipleResources(ctx context.Context, nginx *webappv1alpha1.Nginx) error {
    // 並行處理多個資源
    var wg sync.WaitGroup
    errCh := make(chan error, 3)

    wg.Add(3)

    // ConfigMap
    go func() {
        defer wg.Done()
        if _, _, err := r.reconcileConfigMap(ctx, nginx); err != nil {
            errCh <- fmt.Errorf("configmap: %w", err)
        }
    }()

    // Deployment
    go func() {
        defer wg.Done()
        if err := r.reconcileDeployment(ctx, nginx, ""); err != nil {
            errCh <- fmt.Errorf("deployment: %w", err)
        }
    }()

    // Service
    go func() {
        defer wg.Done()
        if err := r.reconcileService(ctx, nginx); err != nil {
            errCh <- fmt.Errorf("service: %w", err)
        }
    }()

    wg.Wait()
    close(errCh)

    // 收集錯誤
    for err := range errCh {
        if err != nil {
            return err
        }
    }

    return nil
}
```

**3\. 使用 Generation 判斷資源變更**

```go
func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    nginx := &webappv1alpha1.Nginx{}
    if err := r.Get(ctx, req.NamespacedName, nginx); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 如果 ObservedGeneration 和 Generation 相同，說明 spec 沒變
    if nginx.Status.ObservedGeneration == nginx.Generation {
        // 只檢查子資源狀態，不重新創建
        return r.reconcileStatus(ctx, nginx)
    }

    // spec 改變了，執行完整 reconcile
    return r.fullReconcile(ctx, nginx)
}
```

### 10.1.4 限制並發和 Rate Limiting

```go
// main.go - 配置 Controller 選項
func main() {
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        // ...
    })

    if err = (&controller.NginxReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller")
        os.Exit(1)
    }
}

// SetupWithManager - 配置並發和限流
func (r *NginxReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webappv1alpha1.Nginx{}).
        WithOptions(controller.Options{
            // 最大並發 reconcile 數
            MaxConcurrentReconciles: 3,

            // Rate Limiter
            RateLimiter: workqueue.NewItemExponentialFailureRateLimiter(
                5*time.Millisecond,  // base delay
                1000*time.Second,    // max delay
            ),
        }).
        Complete(r)
}
```

## 10.2 安全最佳實踐

### 10.2.1 RBAC 最小權限原則

**僅授予必要權限**：

```go
// +kubebuilder:rbac:groups=webapp.example.com,resources=nginxes,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=webapp.example.com,resources=nginxes/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services;configmaps,verbs=get;list;watch;create;update;patch;delete

// 不要使用通配符
// 錯誤示範：
// +kubebuilder:rbac:groups=*,resources=*,verbs=*
```

**使用 ServiceAccount 隔離**：

```yaml
# config/rbac/service_account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-operator
  namespace: nginx-operator-system
---
# config/rbac/role_binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nginx-operator-manager-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nginx-operator-manager-role
subjects:
  - kind: ServiceAccount
    name: nginx-operator
    namespace: nginx-operator-system
```

### 10.2.2 Webhook 安全

**配置 TLS 和證書管理**：

```yaml
# config/certmanager/certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: nginx-operator-serving-cert
  namespace: nginx-operator-system
spec:
  dnsNames:
    - nginx-operator-webhook-service.nginx-operator-system.svc
    - nginx-operator-webhook-service.nginx-operator-system.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: nginx-operator-selfsigned-issuer
  secretName: webhook-server-cert
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: nginx-operator-selfsigned-issuer
  namespace: nginx-operator-system
spec:
  selfSigned: {}
```

**Validating Webhook 範例**：

```go
// +kubebuilder:webhook:path=/validate-webapp-example-com-v1alpha1-nginx,mutating=false,failurePolicy=fail,groups=webapp.example.com,resources=nginxes,verbs=create;update,versions=v1alpha1,name=vnginx.kb.io,sideEffects=None,admissionReviewVersions=v1

func (r *Nginx) ValidateCreate() (admission.Warnings, error) {
    // 驗證 replicas 範圍
    if r.Spec.Replicas != nil && (*r.Spec.Replicas < 1 || *r.Spec.Replicas > 10) {
        return nil, fmt.Errorf("replicas must be between 1 and 10")
    }

    // 驗證 image 格式
    if !strings.HasPrefix(r.Spec.Image, "nginx:") {
        return nil, fmt.Errorf("image must start with 'nginx:'")
    }

    // 驗證 config 語法（可選）
    if r.Spec.Config != "" {
        if err := validateNginxConfig(r.Spec.Config); err != nil {
            return nil, fmt.Errorf("invalid nginx config: %w", err)
        }
    }

    return nil, nil
}

func (r *Nginx) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
    oldNginx := old.(*Nginx)

    // 防止降級（可選規則）
    if r.Spec.Image != "" && oldNginx.Spec.Image != "" {
        oldVersion := extractVersion(oldNginx.Spec.Image)
        newVersion := extractVersion(r.Spec.Image)
        if newVersion < oldVersion {
            return admission.Warnings{"Downgrading nginx version may cause issues"}, nil
        }
    }

    return r.ValidateCreate()
}
```

### 10.2.3 Secret 管理

**從 Secret 讀取敏感配置**：

```go
func (r *NginxReconciler) reconcileDeployment(ctx context.Context, nginx *webappv1alpha1.Nginx) error {
    // 從 Secret 掛載 TLS 證書
    volumes := []corev1.Volume{
        {
            Name: "config",
            VolumeSource: corev1.VolumeSource{
                ConfigMap: &corev1.ConfigMapVolumeSource{
                    LocalObjectReference: corev1.LocalObjectReference{
                        Name: nginx.Name + "-config",
                    },
                },
            },
        },
    }

    volumeMounts := []corev1.VolumeMount{
        {
            Name:      "config",
            MountPath: "/etc/nginx/nginx.conf",
            SubPath:   "nginx.conf",
        },
    }

    // 如果配置了 TLS
    if nginx.Spec.TLS != nil {
        volumes = append(volumes, corev1.Volume{
            Name: "tls",
            VolumeSource: corev1.VolumeSource{
                Secret: &corev1.SecretVolumeSource{
                    SecretName: nginx.Spec.TLS.SecretName,
                },
            },
        })

        volumeMounts = append(volumeMounts, corev1.VolumeMount{
            Name:      "tls",
            MountPath: "/etc/nginx/tls",
            ReadOnly:  true,
        })
    }

    // 使用在 Deployment spec 中...
}
```

**加密 etcd 中的 Secret**（集群級配置）：

```yaml
# /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
spec:
  containers:
    - name: kube-apiserver
      command:
        - kube-apiserver
        - --encryption-provider-config=/etc/kubernetes/encryption-config.yaml
      volumeMounts:
        - name: encryption-config
          mountPath: /etc/kubernetes/encryption-config.yaml
          readOnly: true
  volumes:
    - name: encryption-config
      hostPath:
        path: /etc/kubernetes/encryption-config.yaml
```

## 10.3 可觀測性

### 10.3.1 結構化日誌

**使用 logr 進行結構化日誌記錄**：

```go
import (
    "sigs.k8s.io/controller-runtime/pkg/log"
)

func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // 基本日誌
    logger.Info("Reconciling Nginx",
        "namespace", req.Namespace,
        "name", req.Name,
    )

    // 帶錯誤的日誌
    if err := r.Get(ctx, req.NamespacedName, &nginx); err != nil {
        logger.Error(err, "Failed to get Nginx",
            "namespace", req.Namespace,
            "name", req.Name,
        )
        return ctrl.Result{}, err
    }

    // Debug 級別日誌
    logger.V(1).Info("Detailed debug info",
        "spec", nginx.Spec,
        "status", nginx.Status,
    )

    // 使用 WithValues 添加上下文
    logger = logger.WithValues(
        "nginx-version", nginx.Spec.Image,
        "replicas", nginx.Spec.Replicas,
    )

    logger.Info("Starting reconciliation")

    return ctrl.Result{}, nil
}
```

**配置日誌級別** (`main.go`)：

```go
import (
    "flag"
    "sigs.k8s.io/controller-runtime/pkg/log/zap"
)

func main() {
    var logLevel int
    flag.IntVar(&logLevel, "log-level", 0, "Log level (0=info, 1=debug, 2=trace)")

    opts := zap.Options{
        Development: true,
        Level:       zapcore.Level(-logLevel),
    }

    ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

    // ...
}
```

### 10.3.2 Metrics 暴露

**使用 Prometheus metrics**：

```go
import (
    "github.com/prometheus/client_golang/prometheus"
    "sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
    nginxReconcileTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "nginx_operator_reconcile_total",
            Help: "Total number of reconciliations",
        },
        []string{"namespace", "name", "result"},
    )

    nginxReconcileDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "nginx_operator_reconcile_duration_seconds",
            Help:    "Duration of reconciliations",
            Buckets: prometheus.DefBuckets,
        },
        []string{"namespace", "name"},
    )

    nginxCount = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "nginx_operator_nginx_count",
            Help: "Number of Nginx instances",
        },
        []string{"namespace"},
    )
)

func init() {
    // 註冊 metrics
    metrics.Registry.MustRegister(
        nginxReconcileTotal,
        nginxReconcileDuration,
        nginxCount,
    )
}

func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    start := time.Now()
    defer func() {
        duration := time.Since(start).Seconds()
        nginxReconcileDuration.WithLabelValues(
            req.Namespace,
            req.Name,
        ).Observe(duration)
    }()

    nginx := &webappv1alpha1.Nginx{}
    if err := r.Get(ctx, req.NamespacedName, nginx); err != nil {
        nginxReconcileTotal.WithLabelValues(
            req.Namespace,
            req.Name,
            "error",
        ).Inc()
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // ... reconcile 邏輯

    nginxReconcileTotal.WithLabelValues(
        req.Namespace,
        req.Name,
        "success",
    ).Inc()

    nginxCount.WithLabelValues(req.Namespace).Set(1)

    return ctrl.Result{}, nil
}
```

**配置 ServiceMonitor** (Prometheus Operator)：

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-operator-metrics
  namespace: nginx-operator-system
spec:
  selector:
    matchLabels:
      control-plane: controller-manager
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
```

### 10.3.3 健康檢查

**配置 Health 和 Readiness Probes** (`main.go`)：

```go
import (
    "sigs.k8s.io/controller-runtime/pkg/healthz"
)

func main() {
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        HealthProbeBindAddress: ":8081",
        // ...
    })

    // 添加 health check
    if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
        setupLog.Error(err, "unable to set up health check")
        os.Exit(1)
    }

    // 添加 readiness check
    if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
        setupLog.Error(err, "unable to set up ready check")
        os.Exit(1)
    }

    // ...
}
```

**Deployment 配置**：

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-operator-controller-manager
spec:
  template:
    spec:
      containers:
        - name: manager
          image: nginx-operator:latest
          ports:
            - containerPort: 8080
              name: metrics
            - containerPort: 8081
              name: health
          livenessProbe:
            httpGet:
              path: /healthz
              port: health
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /readyz
              port: health
            initialDelaySeconds: 5
            periodSeconds: 10
```

## 10.4 生產環境部署

### 10.4.1 高可用性配置

**多副本 + Leader Election**：

```go
// main.go
func main() {
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        // 啟用 Leader Election
        LeaderElection:          true,
        LeaderElectionID:        "nginx-operator-lock",
        LeaderElectionNamespace: "nginx-operator-system",

        // 配置 lease 參數
        LeaseDuration: pointer.Duration(15 * time.Second),
        RenewDeadline: pointer.Duration(10 * time.Second),
        RetryPeriod:   pointer.Duration(2 * time.Second),
    })

    // ...
}
```

**Deployment 高可用配置**：

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-operator-controller-manager
spec:
  replicas: 3
  selector:
    matchLabels:
      control-plane: controller-manager
  template:
    metadata:
      labels:
        control-plane: controller-manager
    spec:
      # Pod 反親和性 - 分散到不同節點
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    control-plane: controller-manager
                topologyKey: kubernetes.io/hostname

      # 優先級
      priorityClassName: system-cluster-critical

      containers:
        - name: manager
          image: nginx-operator:latest
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi

          # 安全上下文
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            runAsNonRoot: true
            runAsUser: 65532
            seccompProfile:
              type: RuntimeDefault
```

### 10.4.2 資源限制

**配置 Resource Quotas**：

```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: nginx-operator-quota
  namespace: nginx-operator-system
spec:
  hard:
    requests.cpu: "2"
    requests.memory: "2Gi"
    limits.cpu: "4"
    limits.memory: "4Gi"
    persistentvolumeclaims: "10"
```

**配置 LimitRange**：

```yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: nginx-operator-limitrange
  namespace: nginx-operator-system
spec:
  limits:
    - max:
        cpu: "1"
        memory: "1Gi"
      min:
        cpu: "50m"
        memory: "64Mi"
      default:
        cpu: "200m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container
```

### 10.4.3 監控告警

**Prometheus 告警規則**：

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: nginx-operator-alerts
  namespace: nginx-operator-system
spec:
  groups:
    - name: nginx-operator
      interval: 30s
      rules:
        - alert: NginxOperatorDown
          expr: up{job="nginx-operator-metrics"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Nginx Operator is down"
            description: "Nginx Operator has been down for more than 5 minutes"

        - alert: NginxOperatorHighReconcileErrors
          expr: rate(nginx_operator_reconcile_total{result="error"}[5m]) > 0.1
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "High reconcile error rate"
            description: "Nginx Operator has high reconcile error rate: {{ $value }}"

        - alert: NginxInstanceNotReady
          expr: nginx_operator_nginx_count{} - nginx_operator_nginx_ready{} > 0
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Nginx instance not ready"
            description: "Nginx instance in {{ $labels.namespace }} is not ready for 15 minutes"
```

### 10.4.4 備份和恢復

**備份 CRD 和 CR**：

```bash
#!/bin/bash
# backup-nginx-operator.sh

BACKUP_DIR="/backup/nginx-operator/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"

# 備份 CRD
kubectl get crd nginxes.webapp.example.com -o yaml > "$BACKUP_DIR/crd.yaml"

# 備份所有 Nginx CR
kubectl get nginx --all-namespaces -o yaml > "$BACKUP_DIR/nginx-crs.yaml"

# 備份 Operator 配置
kubectl get deployment,service,configmap,secret -n nginx-operator-system -o yaml > "$BACKUP_DIR/operator-resources.yaml"

echo "Backup completed: $BACKUP_DIR"
```

**恢復流程**：

```bash
#!/bin/bash
# restore-nginx-operator.sh

BACKUP_DIR=$1

if [ -z "$BACKUP_DIR" ]; then
    echo "Usage: $0 <backup-directory>"
    exit 1
fi

# 1. 恢復 CRD
kubectl apply -f "$BACKUP_DIR/crd.yaml"

# 2. 恢復 Operator
kubectl apply -f "$BACKUP_DIR/operator-resources.yaml"

# 3. 等待 Operator ready
kubectl wait --for=condition=available --timeout=300s \
    deployment/nginx-operator-controller-manager -n nginx-operator-system

# 4. 恢復 Nginx CR
kubectl apply -f "$BACKUP_DIR/nginx-crs.yaml"

echo "Restore completed from: $BACKUP_DIR"
```

## 10.5 多租戶支援

### 10.5.1 Namespace 隔離

**限制 Operator 監聽特定 Namespace**：

```go
// main.go
func main() {
    // 從環境變數讀取允許的 namespaces
    watchNamespaces := os.Getenv("WATCH_NAMESPACES")
    var namespaces []string
    if watchNamespaces != "" {
        namespaces = strings.Split(watchNamespaces, ",")
    }

    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        Scheme: scheme,
        Cache: cache.Options{
            Namespaces: namespaces,  // 只 watch 這些 namespaces
        },
    })

    // ...
}
```

**Deployment 配置**：

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-operator-controller-manager
spec:
  template:
    spec:
      containers:
        - name: manager
          env:
            - name: WATCH_NAMESPACES
              value: "tenant-a,tenant-b,tenant-c"
```

### 10.5.2 資源配額

**為每個租戶配置 ResourceQuota**：

```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-a-nginx-quota
  namespace: tenant-a
spec:
  hard:
    count/nginxes.webapp.example.com: "10"  # 最多 10 個 Nginx 實例
  scopeSelector:
    matchExpressions:
      - operator: In
        scopeName: PriorityClass
        values: ["tenant-a"]
```

### 10.5.3 驗證租戶權限

**Validating Webhook 檢查權限**：

```go
func (r *Nginx) ValidateCreate() (admission.Warnings, error) {
    // 檢查租戶配額
    quota, err := getTenantQuota(r.Namespace)
    if err != nil {
        return nil, err
    }

    currentCount, err := getNginxCount(r.Namespace)
    if err != nil {
        return nil, err
    }

    if currentCount >= quota {
        return nil, fmt.Errorf(
            "tenant %s has reached quota limit: %d/%d",
            r.Namespace, currentCount, quota,
        )
    }

    // 檢查資源限制
    if r.Spec.Replicas != nil && *r.Spec.Replicas > quota.MaxReplicas {
        return nil, fmt.Errorf(
            "replicas %d exceeds tenant limit %d",
            *r.Spec.Replicas, quota.MaxReplicas,
        )
    }

    return nil, nil
}
```

---

# 十一、常見問題與調試技巧

## 11.1 常見錯誤及解決方案

### 11.1.1 CRD 相關錯誤

**錯誤 1：CRD 未安裝**

```bash
Error: the server could not find the requested resource (get nginxes.webapp.example.com)
```

**解決方案**：

```bash
# 檢查 CRD 是否存在
kubectl get crd | grep nginx

# 安裝 CRD
make install

# 或手動安裝
kubectl apply -f config/crd/bases/webapp.example.com_nginxes.yaml

# 驗證
kubectl get crd nginxes.webapp.example.com -o yaml
```

**錯誤 2：CRD 版本不匹配**

```bash
error: error validating "nginx.yaml": error validating data: ValidationError(Nginx.spec): unknown field "newField"
```

**解決方案**：

```bash
# 1. 更新 CRD 定義
make manifests

# 2. 重新安裝 CRD
kubectl replace -f config/crd/bases/webapp.example.com_nginxes.yaml

# 3. 如果 replace 失敗，需要先刪除（危險操作！會刪除所有 CR）
kubectl delete crd nginxes.webapp.example.com
make install
```

**錯誤 3：Validation 規則不生效**

```yaml
# CRD 中定義了 validation，但創建時沒有被驗證
spec:
  replicas: 100  # 超過 maximum: 10 但沒報錯
```

**解決方案**：

```bash
# 1. 確認 CRD 中包含 validation rules
kubectl get crd nginxes.webapp.example.com -o yaml | grep -A 10 validation

# 2. 重新生成並安裝 CRD
make manifests
kubectl apply -f config/crd/bases/webapp.example.com_nginxes.yaml

# 3. 驗證 validation 生效
kubectl apply -f - <<EOF
apiVersion: webapp.example.com/v1alpha1
kind: Nginx
metadata:
  name: test
spec:
  replicas: 100  # 應該報錯
EOF
```

### 11.1.2 Controller 錯誤

**錯誤 1：Reconcile 死循環**

```bash
# 日誌顯示不停 reconcile
INFO    Reconciling Nginx
INFO    Reconciling Nginx
INFO    Reconciling Nginx
...
```

**原因分析**：

1. Status 更新觸發了 reconcile
    
2. 沒有正確設置 Predicate 過濾
    
3. Requeue 邏輯錯誤
    

**解決方案**：

```go
// 1. 使用 Predicate 過濾 status-only 更新
func (r *NginxReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webappv1alpha1.Nginx{}).
        WithEventFilter(predicate.Funcs{
            UpdateFunc: func(e event.UpdateEvent) bool {
                oldObj := e.ObjectOld.(*webappv1alpha1.Nginx)
                newObj := e.ObjectNew.(*webappv1alpha1.Nginx)

                // 只在 spec 改變時 reconcile
                return oldObj.Generation != newObj.Generation
            },
        }).
        Complete(r)
}

// 2. 正確使用 Status().Update()
func (r *NginxReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ... reconcile logic

    // 使用 Status() subresource 更新，不會觸發新的 reconcile
    if err := r.Status().Update(ctx, nginx); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil  // 不要返回 Requeue: true
}
```

**錯誤 2：無法更新 Status**

```bash
ERROR   Failed to update status    error="the server could not find the requested resource"
```

**解決方案**：

```go
// 確保 CRD 定義包含 status subresource
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status  // <-- 這一行很重要！

type Nginx struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   NginxSpec   `json:"spec,omitempty"`
    Status NginxStatus `json:"status,omitempty"`
}
```

```bash
# 重新生成並安裝 CRD
make manifests
make install
```

**錯誤 3：Owner Reference 錯誤導致資源無法刪除**

```bash
ERROR   Failed to delete Deployment    error="cannot delete resource, owner references are set"
```

**解決方案**：

```go
// 正確設置 Owner Reference
if err := controllerutil.SetControllerReference(nginx, deployment, r.Scheme); err != nil {
    return err
}

// 刪除時會自動清理子資源（Garbage Collection）
// 如果需要手動清理：
func (r *NginxReconciler) cleanupResources(ctx context.Context, nginx *webappv1alpha1.Nginx) error {
    // 列出所有關聯資源
    deployments := &appsv1.DeploymentList{}
    if err := r.List(ctx, deployments, client.InNamespace(nginx.Namespace), client.MatchingLabels{
        "nginx": nginx.Name,
    }); err != nil {
        return err
    }

    // 刪除資源
    for _, deployment := range deployments.Items {
        if err := r.Delete(ctx, &deployment); err != nil {
            return err
        }
    }

    return nil
}
```

### 11.1.3 RBAC 權限錯誤

**錯誤：403 Forbidden**

```bash
ERROR   Failed to list Deployments    error="deployments.apps is forbidden: User \"system:serviceaccount:nginx-operator-system:default\" cannot list resource \"deployments\" in API group \"apps\" in the namespace \"default\""
```

**解決方案**：

```bash
# 1. 檢查 RBAC 註解是否正確
# Controller 代碼中：
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete

# 2. 重新生成 RBAC manifests
make manifests

# 3. 檢查生成的 Role/ClusterRole
cat config/rbac/role.yaml

# 4. 重新部署 Operator
kubectl apply -k config/default

# 5. 驗證 ServiceAccount 權限
kubectl auth can-i list deployments \
    --as=system:serviceaccount:nginx-operator-system:nginx-operator-controller-manager \
    -n default
```

**跨 Namespace 權限問題**：

```bash
# 如果 Operator 需要操作多個 namespace，需要使用 ClusterRole
# 修改 config/default/kustomization.yaml：

# 註釋掉這一行（使用 Role）
# - ../rbac/role.yaml
# - ../rbac/role_binding.yaml

# 使用 ClusterRole
- ../rbac/cluster_role.yaml
- ../rbac/cluster_role_binding.yaml
```

### 11.1.4 Webhook 錯誤

**錯誤：Webhook 連接超時**

```bash
Error from server (InternalError): Internal error occurred: failed calling webhook "vnginx.kb.io": Post "https://nginx-operator-webhook-service.nginx-operator-system.svc:443/validate-webapp-example-com-v1alpha1-nginx?timeout=10s": context deadline exceeded
```

**解決方案**：

```bash
# 1. 檢查 webhook service 是否存在
kubectl get svc -n nginx-operator-system

# 2. 檢查 webhook pod 是否運行
kubectl get pods -n nginx-operator-system

# 3. 檢查證書是否正確配置
kubectl get secret -n nginx-operator-system | grep webhook

# 4. 檢查 ValidatingWebhookConfiguration
kubectl get validatingwebhookconfiguration

# 5. 查看 webhook 配置詳情
kubectl get validatingwebhookconfiguration nginx-operator-validating-webhook-configuration -o yaml

# 6. 測試 webhook service 連接
kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never -- \
    curl -k https://nginx-operator-webhook-service.nginx-operator-system.svc:443

# 7. 如果開發環境不需要 webhook，可以禁用
export ENABLE_WEBHOOKS=false
make run
```

**錯誤：證書驗證失敗**

```bash
Error: x509: certificate signed by unknown authority
```

**解決方案**：

```bash
# 1. 確保安裝了 cert-manager
kubectl get pods -n cert-manager

# 如果沒有安裝：
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# 2. 檢查 Certificate 資源
kubectl get certificate -n nginx-operator-system

# 3. 檢查證書狀態
kubectl describe certificate nginx-operator-serving-cert -n nginx-operator-system

# 4. 手動觸發證書重新生成
kubectl delete certificate nginx-operator-serving-cert -n nginx-operator-system
kubectl apply -f config/certmanager/certificate.yaml

# 5. 等待證書 ready
kubectl wait --for=condition=ready certificate/nginx-operator-serving-cert \
    -n nginx-operator-system --timeout=300s
```

## 11.2 調試技巧

### 11.2.1 本地調試

**使用 Delve 調試**：

```bash
# 1. 安裝 Delve
go install github.com/go-delve/delve/cmd/dlv@latest

# 2. 啟動調試模式
dlv debug ./main.go -- --zap-devel

# 3. 設置斷點
(dlv) break internal/controller/nginx_controller.go:60
(dlv) continue

# 4. 查看變量
(dlv) print nginx
(dlv) print err

# 5. 查看調用棧
(dlv) stack
```

**VS Code 調試配置** (`.vscode/launch.json`)：

```json
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug Operator",
            "type": "go",
            "request": "launch",
            "mode": "debug",
            "program": "${workspaceFolder}/main.go",
            "args": ["--zap-devel"],
            "env": {
                "ENABLE_WEBHOOKS": "false",
                "KUBECONFIG": "${env:HOME}/.kube/config"
            }
        },
        {
            "name": "Attach to Process",
            "type": "go",
            "request": "attach",
            "mode": "local",
            "processId": "${command:pickProcess}"
        }
    ]
}
```

### 11.2.2 日誌分析

**增加日誌詳細度**：

```bash
# 運行時指定 log level
make run -- --zap-log-level=debug

# 或在代碼中：
logger.V(1).Info("Debug message", "key", value)  # debug
logger.V(2).Info("Trace message", "key", value)  # trace
```

**結構化日誌查詢**：

```bash
# 使用 jq 過濾日誌
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager | jq 'select(.level == "error")'

# 查找特定資源的日誌
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager | \
    jq 'select(.nginx == "nginx-sample")'

# 統計錯誤類型
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager | \
    jq -r 'select(.level == "error") | .msg' | sort | uniq -c
```

### 11.2.3 性能分析

**CPU Profiling**：

```go
// main.go
import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // 啟動 pprof server
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // ... 其他代碼
}
```

```bash
# 收集 30 秒的 CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 交互式分析
(pprof) top10
(pprof) list reconcileDeployment
(pprof) web  # 生成可視化圖表（需要 graphviz）
```

**Memory Profiling**：

```bash
# 收集 heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# 分析內存分配
(pprof) top10
(pprof) list NginxReconciler.Reconcile

# 查看內存洩漏
go tool pprof -alloc_space http://localhost:6060/debug/pprof/heap
```

**Goroutine 分析**：

```bash
# 查看所有 goroutine
curl http://localhost:6060/debug/pprof/goroutine?debug=2

# 分析 goroutine 洩漏
go tool pprof http://localhost:6060/debug/pprof/goroutine
```

### 11.2.4 資源狀態檢查

**完整的資源檢查腳本**：

```bash
#!/bin/bash
# debug-nginx.sh - Nginx Operator 診斷腳本

NAMESPACE=${1:-default}
NGINX_NAME=${2:-nginx-sample}

echo "=== Nginx Resource ==="
kubectl get nginx $NGINX_NAME -n $NAMESPACE -o yaml

echo -e "\n=== Nginx Status ==="
kubectl get nginx $NGINX_NAME -n $NAMESPACE -o jsonpath='{.status}' | jq

echo -e "\n=== Nginx Events ==="
kubectl get events -n $NAMESPACE --field-selector involvedObject.name=$NGINX_NAME

echo -e "\n=== Deployment ==="
kubectl get deployment $NGINX_NAME -n $NAMESPACE -o yaml

echo -e "\n=== Deployment Events ==="
kubectl get events -n $NAMESPACE --field-selector involvedObject.name=$NGINX_NAME,involvedObject.kind=Deployment

echo -e "\n=== Pods ==="
kubectl get pods -n $NAMESPACE -l nginx=$NGINX_NAME

echo -e "\n=== Pod Events ==="
for pod in $(kubectl get pods -n $NAMESPACE -l nginx=$NGINX_NAME -o name); do
    echo "Events for $pod:"
    kubectl get events -n $NAMESPACE --field-selector involvedObject.name=$(basename $pod)
done

echo -e "\n=== Service ==="
kubectl get svc $NGINX_NAME -n $NAMESPACE -o yaml

echo -e "\n=== ConfigMap ==="
kubectl get cm ${NGINX_NAME}-config -n $NAMESPACE -o yaml

echo -e "\n=== Operator Logs (last 50 lines) ==="
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager --tail=50
```

使用：

```bash
chmod +x debug-nginx.sh
./debug-nginx.sh default nginx-sample
```

## 11.3 故障排除流程

### 11.3.1 問題診斷流程圖

```bash
1. CR 能否創建成功？
   ├─ No  → 檢查 CRD 安裝、Validation Webhook
   └─ Yes → 繼續

2. Operator 能否收到事件？
   ├─ No  → 檢查 Operator 運行狀態、RBAC 權限、Namespace 配置
   └─ Yes → 繼續

3. Reconcile 是否執行？
   ├─ No  → 檢查 Predicate 過濾、事件觸發條件
   └─ Yes → 繼續

4. 子資源是否創建？
   ├─ No  → 檢查 RBAC、資源定義、錯誤日誌
   └─ Yes → 繼續

5. 子資源狀態是否正常？
   ├─ No  → 檢查資源配置、鏡像、探針、資源限制
   └─ Yes → 問題解決
```

### 11.3.2 常見場景排查

**場景 1：Pod 無法啟動**

```bash
# 1. 查看 Pod 狀態
kubectl get pods -l nginx=nginx-sample

# 2. 查看詳細錯誤
kubectl describe pod <pod-name>

# 3. 查看日誌
kubectl logs <pod-name>

# 常見原因：
# - ImagePullBackOff → 鏡像不存在或無權限
# - CrashLoopBackOff → 容器啟動失敗，檢查配置和日誌
# - Pending → 資源不足或調度失敗
```

**場景 2：配置更新不生效**

```bash
# 1. 檢查 ConfigMap 是否更新
kubectl get cm nginx-sample-config -o yaml

# 2. 檢查 Deployment 的 config-version label
kubectl get deployment nginx-sample -o jsonpath='{.spec.template.metadata.labels.config-version}'

# 3. 如果 label 沒變，檢查 reconcile 日誌
kubectl logs -n nginx-operator-system deployment/nginx-operator-controller-manager | grep nginx-sample

# 4. 手動觸發 reconcile
kubectl annotate nginx nginx-sample force-reconcile="$(date +%s)"
```

**場景 3：資源洩漏**

```bash
# 1. 列出所有孤立資源（沒有 owner reference）
kubectl get deployment,svc,cm --all-namespaces -o json | \
    jq -r '.items[] | select(.metadata.ownerReferences == null) | "\(.metadata.namespace)/\(.kind)/\(.metadata.name)"'

# 2. 清理孤立資源
kubectl delete deployment <name> -n <namespace>

# 3. 確保 Controller 設置了 Owner Reference
# 代碼檢查：
if err := controllerutil.SetControllerReference(nginx, deployment, r.Scheme); err != nil {
    return err
}
```

## 11.4 最佳實踐總結

### 11.4.1 開發階段

1. **使用 Kind 本地測試**
    
    ```bash
    make kind-cluster
    make install
    make run
    ```
    
2. **頻繁運行測試**
    
    ```bash
    make test
    make test-e2e
    ```
    
3. **使用 Linter**
    
    ```bash
    golangci-lint run ./...
    ```
    

### 11.4.2 部署階段

1. **使用 Kustomize 管理環境**
    
    ```bash
    config/
    ├── base/
    └── overlays/
        ├── development/
        ├── staging/
        └── production/
    ```
    
2. **配置資源限制**
    
    ```yaml
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi
    ```
    
3. **啟用 Leader Election**
    
    ```go
    LeaderElection: true
    ```
    

### 11.4.3 運維階段

1. **監控關鍵指標**
    
    * Reconcile 成功率
        
    * Reconcile 延遲
        
    * 資源使用率
        
2. **配置告警**
    
    * Operator Down
        
    * 高錯誤率
        
    * 資源洩漏
        
3. **定期備份**
    
    ```bash
    kubectl get nginx --all-namespaces -o yaml > backup.yaml
    ```
    

---

## 總結

本教學從基礎概念到高級實踐，完整覆蓋了 Kubernetes Operator 開發的各個方面：

1. **第一~三章**：Operator 基礎概念、架構設計
    
2. **第四~六章**：深入 OpenTelemetry Operator 代碼分析
    
3. **第七~八章**：開發環境、測試實踐
    
4. **第九章**：完整的 Nginx Operator 實戰
    
5. **第十章**：性能、安全、可觀測性等進階主題
    
6. **第十一章**：故障排查和最佳實踐
    

**下一步建議**：

1. 閱讀 [Operator SDK 文檔](https://sdk.operatorframework.io/)
    
2. 研究優秀的開源 Operator 項目
    
3. 在實際項目中實踐所學知識
    

成為 Kubernetes Operator 開發專家！ Go 🚀
