Skip to content

Generate PyTorch Model Archive File

We will download the model files and generate a Model Archive file for the desired LLM, which will be used by TorchServe to load the model. Find out more about Torch Model Archiver here.

Run the following command for downloading model files and generating MAR file:

python3 $WORK_DIR/llm/generate.py [--hf_token <HUGGINGFACE_HUB_TOKEN> --repo_version <REPO_COMMIT_ID>] --model_name <MODEL_NAME> --output <NFS_LOCAL_MOUNT_LOCATION>

  • model_name: Name of a validated model
  • output: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored
  • repo_version: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used)
  • hf_token: Your HuggingFace token. Needed to download LLAMA(2) models. (It can alternatively be set using the environment variable 'HF_TOKEN')

Examples

The following are example commands to generate the model archive file.

Download MPT-7B model files and generate model archive for it:

python3 $WORK_DIR/llm/generate.py --model_name mpt_7b --output /mnt/llm
Download Falcon-7B model files and generate model archive for it:
python3 $WORK_DIR/llm/generate.py --model_name falcon_7b --output /mnt/llm
Download Llama2-7B model files and generate model archive for it:
python3 $WORK_DIR/llm/generate.py --model_name llama2_7b --output /mnt/llm --hf_token <HUGGINGFACE_HUB_TOKEN>