Sampling
Use ask to request LLM completions from the connected client. Learn when ask is null, how to guard it, and how to use SamplingOptions for full control.
Sampling lets your MCP server ask the client's LLM for a completion mid-execution. The MCP client already has a language model running — your tool can hand it a prompt and get a response back without you managing any AI infrastructure.
The ask parameter
ask is the second argument to every handler function:
handler(args, ask?, ctx?) => resultWhen the client advertises sampling support, ask is a function you can call. When the client does not support sampling — or when you are running locally with mctx-dev — ask is null.
Always check for null before calling it:
const summarize = async (args, ask) => {
if (!ask) {
return "Summary not available — this client does not support sampling.";
}
const summary = await ask("Summarize this in one sentence: " + args.text);
return summary;
};
summarize.description = "Summarize a block of text";
summarize.input = {
text: T.string({ required: true, description: "Text to summarize" }),
};Skipping the null guard causes a runtime error on clients that do not support sampling. This is the most common sampling mistake.
When ask is null
ask is null in two situations:
- The client does not advertise sampling capability in its MCP handshake.
- You are running the local dev server (
mctx-dev). The dev server stubs sampling out because there is no connected LLM session.
Your handler must work in both cases. Always return something useful when ask is null — either a fallback result or a helpful explanation of what the user needs to do to enable sampling.
Simple usage
Pass a string prompt and receive the LLM's response as a string:
const translate = async (args, ask) => {
if (!ask) return args.text;
return await ask(`Translate to ${args.language}: ${args.text}`);
};
translate.description = "Translate text to a target language";
translate.input = {
text: T.string({ required: true, description: "Text to translate" }),
language: T.string({ required: true, description: "Target language" }),
};SamplingOptions
For more control, pass a SamplingOptions object instead of a plain string. This lets you structure the conversation, set a system prompt, and constrain the response:
import { createServer, T } from "@mctx-ai/mcp";
const server = createServer();
const smartAnswer = async (args, ask) => {
if (!ask) {
return `Question: ${args.question}\n\nAnswer: LLM sampling is not available in this transport mode.`;
}
const answer = await ask({
messages: [
{
role: "user",
content: { type: "text", text: args.question },
},
],
systemPrompt: "You are a concise assistant. Answer in two sentences or fewer.",
maxTokens: 256,
});
return `Question: ${args.question}\n\nAnswer: ${answer}`;
};
smartAnswer.description =
"Answer a question using LLM sampling with a fallback for non-sampling clients";
smartAnswer.input = {
question: T.string({ required: true, description: "Question to answer" }),
};
smartAnswer.annotations = {
readOnlyHint: true,
destructiveHint: false,
openWorldHint: true,
};
server.tool("smart-answer", smartAnswer);
export default { fetch: server.fetch };SamplingOptions fields
| Field | Type | Description |
|---|---|---|
messages | Message[] | Conversation messages to send. Required. |
systemPrompt | string | System-level instructions for the LLM. Optional. |
maxTokens | number | Upper bound on tokens in the response. Optional. |
temperature | number | Sampling temperature between 0.0 and 1.0. Optional. |
topP | number | Top-p sampling parameter between 0.0 and 1.0. Optional. |
stopSequences | string[] | Sequences that terminate generation. Optional. |
modelPreferences | object | Model selection hints. Optional. See below. |
Model preferences
modelPreferences.hints lets you suggest a model by name. The client is not required to honor the hint — it uses whatever model it has available:
const answer = await ask({
messages: [{ role: "user", content: { type: "text", text: prompt } }],
modelPreferences: {
hints: [{ name: "claude-3-5-sonnet" }],
},
maxTokens: 1024,
});Use cases
Sampling is useful anywhere your MCP server benefits from language understanding that you would otherwise need to implement yourself:
- Summarization — condense long documents, logs, or data into a short response.
- Translation — convert text between languages.
- Rephrasing — rewrite content in a different tone or reading level.
- Classification — categorize unstructured text (sentiment, topic, intent).
- Question answering — answer natural-language questions against content your MCP server has already retrieved.
Next steps
- Logging — capture debug output from inside your handlers
- Framework API Reference — full type definitions for
AskFunctionandSamplingOptions
See something wrong? Report it or suggest an improvement — your feedback helps make these docs better.