Inference Requests¶
The Inference Server can be inferenced through the TorchServe Inference API. Find out more about it in the official TorchServe Inference API documentation.
Server Configuration
Variable | Value |
---|---|
inference_server_endpoint | localhost |
inference_port | 8080 |
The following are example cURL commands to send inference requests to the Inference Server.
Ping Request¶
To find out the status of a TorchServe server, you can use the ping API that TorchServe supports:
curl http://{inference_server_endpoint}:{inference_port}/ping
Example¶
curl http://localhost:8080/ping
Note
This only provides information on whether the TorchServe server is running. To check whether a model is successfully registered on TorchServe, you can list all models and describe a registered model.
Inference Requests¶
The following is the template command for inferencing with a text file:
curl -v -H "Content-Type: application/text" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.txt
The following is the template command for inferencing with a json file:
curl -v -H "Content-Type: application/json" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.json
Input data files can be found in the $WORK_DIR/data
folder.
Examples¶
For MPT-7B model
curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text1.txt
curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text4.json
For Falcon-7B model
curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text1.txt
curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text3.json
For Llama2-7B model
curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text1.txt
curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text3.json
Input data format¶
Input data can be in either text or JSON format.
-
For text format, the input should be a '.txt' file containing the prompt
-
For JSON format, the input should be a '.json' file containing the prompt in the format below:
{ "id": "42", "inputs": [ { "name": "input0", "shape": [-1], "datatype": "BYTES", "data": ["Capital of India?"] } ] }