Inference Requests
Kubeflow serving can be inferenced and managed through its Inference APIs. Find out more about Kubeflow serving APIs in the official Inference API documentation.
Set HOST and PORT¶
The first step is to determine the ingress IP and ports and set INGRESS_HOST and INGRESS_PORT.
The following command assigns the IP address of the host where the Istio Ingress Gateway pod is running to the INGRESS_HOST variable:
export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
Set Service Host Name¶
Next step is to determine service hostname. This command retrieves the hostname of a specific InferenceService in a Kubernetes environment by extracting it from the status.url field and assigns it to the SERVICE_HOSTNAME variable:
SERVICE_HOSTNAME=$(kubectl get inferenceservice <DEPLOYMENT_NAME> -o jsonpath='{.status.url}' | cut -d "/" -f 3)
Example:¶
SERVICE_HOSTNAME=$(kubectl get inferenceservice llm-deploy -o jsonpath='{.status.url}' | cut -d "/" -f 3)
Curl request to get inference¶
In the next step inference can be done on the deployed model. The following is the template command for inferencing with a json file:
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/{model_name}/infer -d @{input_file_path}
Examples:¶
Curl request for MPT-7B model
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mpt_7b/infer -d @$WORK_DIR/data/qa/sample_text1.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/falcon_7b/infer -d @$WORK_DIR/data/summarize/sample_text1.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/llama2_7b/infer -d @$WORK_DIR/data/translate/sample_text1.json
Input data format¶
Input data should be in JSON format. The input should be a '.json' file containing the prompt in the format below:
{
"id": "42",
"inputs": [
{
"name": "input0",
"shape": [-1],
"datatype": "BYTES",
"data": ["Capital of India?"]
}
]
}