Synthesizer Quickstart#

Welcome to the Synthesizer quickstart guide! Synthesizer, or ΨΦ, is your portal to combining Retrieval-Augmented Generation (RAG) with large language models (LLMs) like OpenAI’s models, Anthropic, HuggingFace, and vLLM.

This guide will introduce you to:

Using the RAG provider interface.
Evaluating your RAG pipeline.

Let’s get started!

Setting Up Your Environment#

Before you start, ensure you’ve installed Synthesizer:

pip install sciphi-synthesizer

For additional details, refer to the installation guide.

Using Synthesizer#

Generate synthetic question answer pairs

export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
python -m synthesizer.scripts.data_augmenter run --dataset="wiki_qa"

tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl
{ "formatted_prompt": "... ### Question:\nwhat country did wine originate in\n\n### Input:\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\nTitle:History of wine....",
{ "completion": Wine originated in the South Caucasus, which is now part of modern-day Armenia ...

Evaluate RAG pipeline performance

export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
python -m synthesizer.scripts.rag_harness --rag_provider="agent-search" --llm_provider_name="sciphi" --n_samples=25

...
INFO:__main__:Now generating completions...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:29<00:00,  3.40it/s]
INFO:__main__:Final Accuracy=0.42

Note

This is a basic introduction to Synthesizer. Check back later for more detailed and intricate documentation that delves deeper into advanced features and customization options.

Developing with Synthesizer#

Here’s how you can use Synthesizer to quickly set up and RAG augmented generation, without diving deep into intricate configurations:

# Requires a valid SCIPHI_API_KEY in env ...

# Imports
from synthesizer.core import LLMProviderName, RAGProviderName
from synthesizer.interface import (
    LLMInterfaceManager,
    RAGInterfaceManager,
)
from synthesizer.llm import GenerationConfig

# RAG Provider Settings
rag_interface = RAGInterfaceManager.get_interface_from_args(
    RAGProviderName("agent-search"),
    limit_hierarchical_url_results=rag_limit_hierarchical_url_results,
    limit_final_pagerank_results=rag_limit_final_pagerank_results,
)
rag_context = rag_interface.get_rag_context(query)

# LLM Provider Settings
llm_interface = LLMInterfaceManager.get_interface_from_args(
    LLMProviderName("openai"),
)

generation_config = GenerationConfig(
    model_name=llm_model_name,
    max_tokens_to_sample=llm_max_tokens_to_sample,
    temperature=llm_temperature,
    top_p=llm_top_p,
    # other generation params here ...
)

formatted_prompt = raw_prompt.format(rag_context=rag_context)
completion = llm_interface.get_completion(
    formatted_prompt, generation_config
)

Synthesizer Quickstart

Contents

Synthesizer Quickstart#

Setting Up Your Environment#

Using Synthesizer#

Developing with Synthesizer#