LiteLLM o3-deep-research 예제 문서¶

이 사이트는 구현된 세 가지 예제를 한국어로 안내합니다.

이 저장소에서 할 수 있는 것¶

처음 보는 사용자에게는 다음 순서를 권장합니다.

현재 구현 완료: Python direct, Java direct, relay 중계 예제
현재 고급 기능: --web-search, --auto-tool-call, relay /api/v1/chat, system_prompt, text_format
현재 검증 상태: Python/Java/relay 테스트, docs build, GitHub Pages 배포, 라이브 검증 결과까지 문서화

web_search_preview: Python / Java direct client에서 --api responses --web-search
system_prompt: relay deep_research wrapper에서 Responses API instructions로 전달
text_format: relay deep_research wrapper에서 JSON 출력 강제 지원
자동 tool calling: client-side --auto-tool-call 과 relay-side POST /api/v1/chat 둘 다 구현

이 저장소의 relay는 일반 대화 요청을 받아 모델이 스스로 deep_research를 호출할지 결정하는 POST /api/v1/chat 엔드포인트를 제공합니다.

요청 필드:

응답 필드:

자세한 내용은 자동 Tool Calling과 Relay 중계 예제를 참고하세요.

RELAY_HOST — 기본 127.0.0.1
RELAY_PORT — 기본 8080
RELAY_TIMEOUT_SECONDS — Chat Completions orchestration timeout (기본 30)
RELAY_RESEARCH_TIMEOUT_SECONDS — deep_research execution timeout (기본 300)
LITELLM_CHAT_MODEL — relay auto tool calling orchestration 모델 (기본 gpt-4o)
RELAY_MAX_INVOCATIONS — 메모리에 유지할 최대 invocation 수 (기본 1024)
RELAY_MAX_STREAM_BYTES — stream invocation 하나가 메모리에 유지할 최대 UTF-8 바이트 수 (기본 1000000)

실제 실행 예시와 결과는 통합 매뉴얼에서 확인할 수 있습니다.