[머신러닝] 쿠버네티스에서 TensorFlow 모델 Triton 서버를 활용해서 서빙하기(saved_model)

2024. 9. 23. 20:44

쿠버네티스에서 트리톤 이미지 파드로 띄우기

kubectl create -f triton-pvc.yaml
kubectl create -f triton-deployment.yaml

```triton-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: triton-pvc
  namespace: ${네임스페이스명}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: rook-ceph-block
  ```

```triton-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: triton-deployment
  namespace: ${네임스페이스명}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: triton-server
  template:
    metadata:
      labels:
        app: triton-server
    spec:
      containers:
      - name: triton-server
        image: nvcr.io/nvidia/tritonserver:24.07-py3
        ports:
        - containerPort: 8000  # HTTP/REST API 포트
        - containerPort: 8001  # gRPC API 포트
        - containerPort: 8002  # Metrics 포트
        volumeMounts:
        - name: model-repository
          mountPath: /models
        env:
        - name: MODEL_PATH
          value: /models
        command: ["/bin/sh", "-c"]
        args: ["sleep 36000000000000000"]
      volumes:
      - name: model-repository
        persistentVolumeClaim:
          claimName: triton-pvc
```

모델 생성하기

tensorflow_model.py 실행하여 saved_model 디렉토리 형식의 모델 파일 생성

saved_model 파일 Tree 구조

/saved_model
/asserts
variables
fingerprint.pb
keras_metadata.pb
savedmodel.pb
/asserts
variables
/variables.index
variables.data-00000-of-00001

파드 내부로 모델 복사하기

# 구성
kubectl cp /${로컬서버경로}/${모델파일명}/* ${네임스페이스명}/${파드명}:/models

# 예시
kubectl cp /home/user/workspace/triton/model_repository/* triton/triton-deployment-85dd868c54-fvnh9:/models

config.pbtxt 파일 생성

``` config.pbtxt
name: "saved_model"
platform: "tensorflow_savedmodel"
max_batch_size: 8
input [
  {
    name: "dense_input"
    data_type: TYPE_FP32
    dims: [ 4 ]
  }
]
output [
  {
    name: "dense"
    data_type: TYPE_FP32
    dims: [ 1 ]
  }
]

# GPU 사용 문법(사용하지 않으면 없어도 됌)
instance_group [
  {
    kind: KIND_GPU      # GPU 사용
    count: 4            # gpu 총 개수 0번 gpu 2개 , 1,2번 각각 1개씩인 경우 총 개수는 4
    gpus: [0,1,2]       # 사용할 GPU ID
  },
  {
    kind: KIND_CPU  # CPU 사용
    count: 1
  }
]

```

GPU ID 조회

nvidia-smi

- triton 컨테이너 내부 접속 후 위 커맨드 입력하면 아래 결과를 출력

- GPU ID = 0

트리톤 서버에서 모델 디렉토리 Tree 형식으로 구조 파악하고 구성하기

/models
/saved_model
/1
config.pbtxt
/model.savedmodel
/saved_model.pb
fingerprint.pb
variables
/variables.index
variables.data-00000-of-00001

트리톤 서버 실행

nohup tritonserver --model-repository=/models --log-verbose=1 > /${원하는디렉토리명}/triton_output.log 2>&1 &

모델 호출 테스트

curl -X POST "http://localhost:8000/v2/models/saved_model/infer" \
     -H "Content-Type: application/json" \
     -d '{
           "inputs": [
             {
               "name": "dense_input",
               "shape": [1, 4],
               "datatype": "FP32",
               "data": [[750, 3.70, 3, 0]]
             }
           ]
         }'
         
         
curl -X POST "http://localhost:8000/v2/models/saved_model/versions/1/infer" \
     -H "Content-Type: application/json" \
     -d '{
           "inputs": [
             {
               "name": "dense_input",
               "shape": [1, 4],
               "datatype": "FP32",
               "data": [[750, 3.70, 3, 0]]
             }
           ]
         }'

로컬에서 모델이 정상적으로 호출되는 것을 확인했다면 쿠버네티스에서 서비스와 인그레스를 생성하여 외부에서도 사용할 수 있게 수정한다.

kubectl create -f triton-ingress.yaml
kubectl create -f triton-svc.yaml

``triton-svc.yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2024-08-06T03:07:34Z"
  name: triton-svc
  namespace: ${네임스페이스명}
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: port-triton-server
    port: 8000
    protocol: TCP
    targetPort: 8000
  selector:
    app: triton-server
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
```

```triton-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-triton
  namespace: ${네임스페이스명}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - backend:
          service:
            name: triton-svc
            port:
              number: 8000
        path: /100(/|$)(.*)
        pathType: Prefix
status:
  loadBalancer:
    ingress:
    - ip: 10.10.10.123
```

외부에서 인그레스를 통해서 API 호출하기

GET http://10.10.10.123/100/v2/models/saved_model

POST http://10.10.10.123/100/v2/models/saved_model/infer

HEADERS
Content-Type : application/json

BODY
{
  "inputs": [
    {
      "name": "dense_input",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [1.2, -11, 0.9, 0]
    }
  ]
}

트리톤에 호환되지 않은 버전의 텐서플로를 실행하는 경우 에러 발생

- triton 서버까지는 정상 동작함. 그러나 API 호출 시 에러 발생.

```jupyter 환경(Python 3.8.19/ tensorflow 2.13.1 / keras 2.13.1) === triton 24.07 - 호환

# triton 24.07 버전에서 지원되는 텐서플로 버전

TensorFlow 2.12.0
TensorFlow 2.13.x
TensorFlow 2.14.x

```윈도우 로컬환경(python 3.11.9 / tensorflow 2.17.0 / keras 3.5.0) -> saved_model 생성하여 triton서버 실행 후 api 호출 시 에러 발생
{"error":"2 root error(s) found.\n (0) FAILED_PRECONDITION: Could not find variable sequential/dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/sequential/dense/bias/N10tensorflow3VarE does not exist.\n\t [[{{function_node __inference_serving_default_389}}{{node sequential_1/dense_1/Add/ReadVariableOp}}]]\n\t [[StatefulPartitionedCall/_25]]\n (1) FAILED_PRECONDITION: Could not find variable sequential/dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/sequential/dense/bias/N10tensorflow3VarE does not exist.\n\t [[{{function_node __inference_serving_default_389}}{{node sequential_1/dense_1/Add/ReadVariableOp}}]]\n0 successful operations.\n0 derived errors ignored."}

첨부파일

저작자표시 비영리 동일조건 (새창열림)

'최근 포스팅' 카테고리의 다른 글

[머신러닝] 쿠버네티스에서 pytorch 모델 Triton서버를 활용해서 서빙하기(model.pt) (0)	2024.10.02
[도커] 로컬에 설치한 넥서스에 새로 빌드 후 이미지 push하기 (0)	2024.09.23
[리눅스] CentOS7 yum 커맨드 사용 에러 (3)	2024.09.23
[쿠버네티스] rook-ceph 에러 트러블 슈팅 과정 (0)	2024.09.13
[kubernetes] docker, containerd 환경에서 Disk 크기 제어 방법 (0)	2024.08.29

초급에서 고급까지