Custom Model Support¶

In some cases you may want to use a custom model, e.g. a custom fine-tuned model. We provide the capability to generate a MAR file with custom models and start an inference server using Kubeflow serving.

Generate Model Archive File for Custom Models¶

Note

The model files should be placed in an NFS share accessible by the Nutanix package. This directory will be passed to the --model_path argument. You'll also need to provide the --output path where you want the model archive export to be stored.

To generate the MAR file, run the following:

python3 $WORK_DIR/llm/generate.py --skip_download [--repo_version <REPO_COMMIT_ID> --handler <CUSTOM_HANDLER_PATH>] --model_name <MODEL_NAME> --model_path <MODEL_PATH> --output <NFS_LOCAL_MOUNT_LOCATION>

skip_download: Set flag to skip downloading the model files, must be set for custom models
model_name: Name of custom model
repo_version: Any model version, defaults to "1.0" (optional)
model_path: Absolute path of custom model files (should be non empty)
output: Mount path to your nfs server to be used in the kube PV where config.properties and model archive file be stored
handler: Path to custom handler, defaults to llm/handler.py (optional)

Start Inference Server with Custom Model Archive File¶

Run the following command for starting Kubeflow serving and running inference on the given input with a custom MAR file:

bash $WORK_DIR/llm/run.sh -n <CUSTOM_MODEL_NAME> -g <NUM_GPUS> -f <NFS_ADDRESS_WITH_SHARE_PATH> -m <NFS_LOCAL_MOUNT_LOCATION> -e <KUBE_DEPLOYMENT_NAME> [OPTIONAL -d <INPUT_PATH>]

n: Name of custom model, this name must not be in model_config
d: Absolute path of input data folder (Optional)
g: Number of gpus to be used to execute (Set 0 to use cpu)
f: NFS server address with share path information
m: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored
e: Name of the deployment metadata