Introducing LangServe, the best way to deploy your LangChains


LangServe is a powerful tool that bridges the gap between your language model prototype and a fully functional application. It streamlines the deployment process, making it easy to share your innovative ideas with the world.

Key Features

Seamless support for streaming and asynchronous execution

  • Optimized parallel execution for improved performance
  • Automatic input and output schema inference and enforcement
  • Built-in API documentation and tracing capabilities
  • Support for multiple chains and hosting platforms

How it works

  • First we create our chain, here using a conversational retrieval chain, but any other chain would work. This is the my_package/ file.
"""A conversational retrieval chain."""

from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_texts(
    ["cats like fish", "dogs like sticks"],
retriever = vectorstore.as_retriever()

model = ChatOpenAI()

chain = ConversationalRetrievalChain.from_llm(model, retriever)
  • Then, we pass that chain to add_routes. This is the my_package/ file.
#!/usr/bin/env python
"""A server for the chain above."""

from fastapi import FastAPI
from langserve import add_routes

from my_package.chain import chain

app = FastAPI(title="Retrieval App")

add_routes(app, chain)

if __name__ == "__main__":
    import uvicorn, host="localhost", port=8000)

This gets you a scalable python web server with

  • Input and Output schemas automatically inferred from your chain, and enforced on every API call, with rich error messages
  • /docs endpoint serves API docs with JSONSchema and Swagger (insert example link)
  • /invoke endpoint that accepts JSON input and returns JSON output from your chain, with support for many concurrent requests in the same server
  • /batch endpoint that produces output for several inputs in parallel, batching calls to LLMs where possible
  • /stream endpoint that sends output chunks as they become available, using SSE (same as OpenAI Streaming API)
  • /stream_log endpoint for streaming all (or some) intermediate steps from your chain/agent
  • Built-in (optional) tracing to LangSmith, just add your API key as an environment variable
  • Support for hosting multiple chains in the same server under separate paths
  • All built with battle-tested open-source Python libraries like FastAPI, Pydantic, uvloop and asyncio.


After an API is created, the next step is to deploy it on a hosting platform. We’re launching with two examples:


We have deployed this repo as an example. The API docs are available here.

LangServe server
LangServe server

We can stream a response:

curl -X POST -H "Content-Type: application/json" --data '{"input": {"topic": "bears"}}'
LangServe Response
LangServe Response

Important Links:


LangServe eliminates the complexity of deploying language model applications, making it easy to bring your ideas to life. Whether you’re a developer or a non-technical innovator, LangServe can help you share your language model with the world.

Valuable comments