Category: Uncategorized

  • The Quest for Enterprise RAG: A Deep Dive into Langfuse Integration

    The Quest for Enterprise RAG: A Deep Dive into Langfuse Integration

    January 14, 2026

    You know that feeling when your code almost works?
    Everything’s green, the server’s running, the LLM is responding
    beautifully, and then you look at that one field in your API response
    that’s supposed to showcase your observability chops for your dream job
    interview and it’s just… null.

    Welcome to my 90-minute debugging odyssey with Langfuse tracing.
    Spoiler: I won. But not before trying five different
    approaches
    , diving into import hell, and learning the single most
    valuable debugging lesson that somehow never makes it into the YouTube
    tutorials.

    Let’s talk about what happens when the docs don’t tell you what you
    need to know.


    TL;DR: What You’ll Learn

    • 🔍 The Problem: Langfuse credentials loaded, traces
      sent to dashboard, but trace_url returned null
      in API responses
    • 🛠️ The Solution: Use
      langfuse_handler.last_trace_id (found by inspecting the
      object with dir()) to build trace URLs manually
    • ⚠️ Environment Variable Hell: .env
      formatting is strict (no spaces around =, no quotes
      needed)
    • 🧪 The Debug-Driven Discovery: When APIs don’t
      expose what you need, inspect attributes at runtime and see what’s
      actually there
    • 💡 Import Gotcha: It’s
      from langfuse.langchain import CallbackHandler, not
      from langfuse.callback
    • 🎯 Key Insight: Trace IDs are only available
      after chain invocation via
      handler.last_trace_id
    • 🚀 Result: Full RAG pipeline observability with
      direct clickable trace URLs for every query

    Act I: The Setup
    (Everything’s Perfect… Right?)

    I’m building a Bank of America-compliant RAG demo for a GenAI
    Engineering interview. The stack is clean:

    • Python 3.11.9 with FastAPI/Uvicorn
    • AWS Bedrock (Claude 3 Sonnet) for generation
    • LangChain for the RAG orchestration
    • Chroma vector DB with BofA’s Responsible AI
      documents
    • LLM Guard for guardrails (input: PromptInjection,
      Toxicity, BanTopics; output: Sensitive, NoRefusal, Relevance)
    • Langfuse for observability (the star of today’s
      show)
    • DeepEval with OpenAI for evaluation

    The RAG pipeline is chef’s kiss. Quality answers, proper
    source citations, guardrails blocking malicious queries like a bouncer
    at an exclusive club. I can taste that job offer.

    But there’s one problem.

    {
      "query": "What are BofA's five core principles for responsible AI?",
      "answer": "According to [BofA_Responsible_AI_Framework_2024.md]...",
      "sources": ["BofA_Responsible_AI_Framework_2024.md", "..."],
      "trace_url": null,
      "guardrails_applied": true
    }

    That trace_url: null is haunting me. I know
    traces are going to Langfuse because I can see them in the dashboard.
    But I can’t link to them programmatically. For a demo about
    observability, that’s… suboptimal.

    Time to fix it.


    Act II: The Descent
    (Five Failed Attempts)

    Attempt 1: “It Must Be an
    Attribute”

    First instinct: the CallbackHandler probably has a
    .trace or .trace_id attribute I can access
    after running the chain.

    if trace and callbacks:
        if hasattr(langfuse_handler, 'trace'):
            trace_url = langfuse_handler.trace.get_trace_url()
        elif hasattr(langfuse_handler, 'trace_id'):
            trace_id = langfuse_handler.trace_id
            trace_url = f"{os.getenv('LANGFUSE_HOST')}/trace/{trace_id}"

    Result: Both hasattr() checks return
    False. No .trace, no
    .trace_id.

    Lesson: Don’t assume the obvious attributes
    exist.


    Attempt 2: “Maybe I Need to
    Flush?”

    I’ve seen callback handlers that need explicit flushing to send data.
    Maybe the trace isn’t finalized until I flush it?

    langfuse_handler.flush()
    if hasattr(langfuse_handler, 'trace_id'):
        trace_id = langfuse_handler.trace_id
        trace_url = f"{os.getenv('LANGFUSE_HOST')}/trace/{trace_id}"

    Terminal output:

    ⚠️  Trace error: 'LangchainCallbackHandler' object has no attribute 'flush'

    Result: Nope. No .flush() method.

    Lesson: Not all handlers follow the same patterns.
    Read the actual API.


    Attempt 3: “Let’s
    Create the Trace Explicitly”

    Okay, what if I’m doing this backwards? What if I need to use the
    Langfuse client to create a trace first, then pass it
    to the handler?

    from langfuse import Langfuse
    from langfuse.callback import CallbackHandler
    
    langfuse_client = Langfuse()
    
    # In the query function
    langfuse_trace = langfuse_client.trace(
        name="rag-query",
        metadata={"query": query, "guardrails_enabled": True}
    )
    langfuse_handler = CallbackHandler(trace=langfuse_trace)
    trace_url = langfuse_trace.get_trace_url()

    Terminal output:

    ModuleNotFoundError: No module named 'langfuse.callback'

    Wait, what?

    'Langfuse' object has no attribute 'trace'

    Result: Double fail. Wrong import path and
    wrong approach.

    Lesson: Verify your imports before debugging logic.
    Also, the Langfuse SDK doesn’t work the way I thought it did.


    Attempt 4: “Fine,
    Let Me Read the Import Docs”

    Turns out the correct import is:

    from langfuse.langchain import CallbackHandler  # NOT langfuse.callback

    But I’m still stuck on how to create a trace. Maybe I can pass
    parameters to CallbackHandler to identify the trace?

    langfuse_handler = CallbackHandler(
        session_id=str(uuid.uuid4()),
        user_id="rag-demo-user"
    )

    Terminal output:

    LangchainCallbackHandler.__init__() got an unexpected keyword argument 'session_id'

    Tried trace_id too. Same error.

    Result: CallbackHandler() doesn’t
    accept these parameters.

    Lesson: The LangChain integration for Langfuse is
    opinionated. It wants you to use it a specific way.


    Attempt 5: “Screw
    It, Let’s Inspect the Object”

    I’m out of ideas from the documentation. Time for the nuclear option:
    see what’s actually on the object at runtime.

    # After chain invocation
    if trace and langfuse_handler:
        attrs = [a for a in dir(langfuse_handler) if not a.startswith('_')]
        print(f"🔍 Available handler attributes: {attrs[:20]}")

    Terminal output:

    🔍 Available handler attributes: ['client', 'context_tokens', 'get_langchain_run_name', 
    'ignore_agent', 'ignore_chain', 'ignore_chat_model', 'ignore_custom_event', 'ignore_llm', 
    'ignore_retriever', 'ignore_retry', 'last_trace_id', 'on_agent_action', 'on_agent_finish', 
    'on_chain_end', 'on_chain_error', 'on_chain_start', 'on_chat_model_start', 
    'on_custom_event', 'on_llm_end', 'on_llm_error']

    Wait.

    last_trace_id.

    THERE IT IS.


    Act III: The Breakthrough

    That moment when you find the thing. The actual thing. Not
    what the docs say should be there. Not what makes logical sense based on
    other APIs. The thing that’s actually there.

    # Create callback handler for this request
    trace_url = None
    callbacks = []
    langfuse_handler = None
    if trace:
        langfuse_handler = CallbackHandler()  # No parameters needed!
        callbacks = [langfuse_handler]
    
    # Build the RAG chain
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt_template
        | llm
        | StrOutputParser()
    )
    
    # Invoke with tracing
    answer = rag_chain.invoke(query, config={"callbacks": callbacks})
    
    # NOW extract the trace URL
    if trace and langfuse_handler:
        if hasattr(langfuse_handler, 'last_trace_id') and langfuse_handler.last_trace_id:
            langfuse_host = os.getenv('LANGFUSE_HOST', 'https://cloud.langfuse.com')
            trace_url = f"{langfuse_host}/trace/{langfuse_handler.last_trace_id}"

    POST request result:

    {
      "query": "What are BofA's responsible AI principles?",
      "answer": "According to [BofA_Responsible_AI_Framework_2024.md]...",
      "sources": ["BofA_Responsible_AI_Framework_2024.md"],
      "trace_url": "https://us.cloud.langfuse.com/trace/550e8400-e29b-41d4-a716-446655440000",
      "guardrails_applied": true
    }

    IT WORKS.

    The trace URL is a real, clickable link to the Langfuse dashboard
    showing: – Full chain execution – Token usage – Latency per step –
    Input/output for each LangChain component – Error tracking

    Perfect observability. Perfect for a demo. Perfect for an
    interview.


    Act IV:
    The Cleanup (Don’t Forget Environment Variables)

    Oh, and before we celebrate too much, let me tell you about the
    .env formatting saga that preceded all this.

    Wrong:

    LANGFUSE_PUBLIC_KEY = "pk-lf-..."
    LANGFUSE_SECRET_KEY = "sk-lf-..."
    LANGFUSE_BASE_URL=https://us.cloud.langfuse.com

    Problems: 1. Spaces around = signs (Python’s
    dotenv doesn’t like that) 2. Quotes around values (treated
    as part of the string) 3. Wrong variable name
    (LANGFUSE_BASE_URL instead of
    LANGFUSE_HOST)

    Correct:

    LANGFUSE_PUBLIC_KEY=pk-lf-...
    LANGFUSE_SECRET_KEY=sk-lf-...
    LANGFUSE_HOST=https://us.cloud.langfuse.com

    Debug logging confirmed the fix:

    print(f"Debug: LANGFUSE_PUBLIC_KEY loaded? {os.getenv('LANGFUSE_PUBLIC_KEY') is not None}")
    print(f"Debug: LANGFUSE_SECRET_KEY loaded? {os.getenv('LANGFUSE_SECRET_KEY') is not None}")

    Output:

    Debug: LANGFUSE_PUBLIC_KEY loaded? True
    Debug: LANGFUSE_SECRET_KEY loaded? True

    Lesson: Environment variables are finicky. Debug
    them early. Assume nothing.


    Lessons Learned (The Good
    Stuff)

    🔍 When Docs Fail,
    Inspect the Object

    This is the meta-lesson. The thing that saved me after 5 failed
    attempts.

    attrs = [a for a in dir(obj) if not a.startswith('_')]
    print(f"Available attributes: {attrs}")

    Don’t guess. Don’t assume the API works like similar APIs you’ve
    seen. Look at what’s actually there. This is the
    equivalent of using console.log(Object.keys(obj)) in
    JavaScript or vars() in Python. It’s unglamorous, but it
    works.

    ⏱️ Timing Matters

    last_trace_id is only available after the chain
    invocation, not before. This makes sense—the trace is created during
    execution—but it’s not intuitive if you’re used to passing trace IDs
    upfront.

    🧩 LangChain
    Integrations Are Opinionated

    The CallbackHandler() doesn’t want you to create traces
    manually. It handles everything internally if you just: 1. Initialize it
    with no parameters (reads from env vars) 2. Pass it to the chain via
    config={"callbacks": [handler]} 3. Access
    last_trace_id after invocation

    Fighting this pattern wastes time.

    📦 Import Paths
    Aren’t Always Obvious

    from langfuse.callback import CallbackHandler
    from langfuse.langchain import CallbackHandler

    The LangChain-specific integration lives in a separate module. Check
    the package structure.

    🔧 Environment
    Variables Need Love

    • No spaces around =
    • No quotes (unless you want quotes in the value)
    • Use the exact variable names the library expects
      (LANGFUSE_HOST not LANGFUSE_BASE_URL)
    • Add debug logging to verify they’re loading

    🎯 UUIDs Are Your
    Friend

    I imported uuid for this project even though I ended up
    not needing it for trace IDs (Langfuse handles that). But having it
    available let me experiment quickly:

    import uuid
    
    trace_id = str(uuid.uuid4())  # Quick unique ID for testing

    🚀 Don’t Give Up on
    the Right Solution

    I could have accepted trace_url: null and just told
    interviewers “check the Langfuse dashboard manually.” But that would
    have been a mediocre demo. Persistence pays off.


    The Final Stack

    Here’s what’s working now:

    Core RAG: – AWS Bedrock Claude 3 Sonnet
    (anthropic.claude-3-sonnet-20240229-v1:0) – Chroma vector DB with 8 BofA
    Responsible AI documents – HuggingFace embeddings
    (sentence-transformers/all-MiniLM-L6-v2) – LangChain LCEL for pipeline
    composition

    Guardrails (LLM Guard): – Input: PromptInjection,
    Toxicity, BanTopics – Output: Sensitive (PII redaction), NoRefusal,
    Relevance – All scanners working perfectly after parameter fixes

    Observability (Langfuse): – ✅ Traces sent to
    dashboard – ✅ Direct trace URLs in API responses – ✅ Token usage
    tracking – ✅ Latency metrics per chain step

    Evaluation (DeepEval + OpenAI): – OpenAI API key
    configured (180 chars, sk-proj-...) – Ready for
    Faithfulness, AnswerRelevancy, ContextualRelevancy metrics –
    LLM-as-Judge for adversarial query testing

    Deployment: – FastAPI server on 127.0.0.1:5000 –
    Auto-reload enabled for development – Git repo:
    github.com/mtnjxynt6p-ai/brownbi_com – Professional commit
    history with semantic prefixes


    Why This Matters for
    Your Next Interview

    Debugging stories like this demonstrate:

    1. Persistence: You don’t give up when the docs don’t
      have the answer
    2. Systematic thinking: You try multiple approaches
      methodically
    3. Debugging skills: You know how to inspect objects,
      read error messages, and isolate problems
    4. Tool knowledge: You understand how integrations
      work (callbacks, handlers, environment variables)
    5. Engineering judgment: You know when to dig deeper
      vs. when to move on

    When I show this demo to Bank of America (or any GenAI role), I won’t
    just show a working RAG system. I’ll show: –
    Observability: Clickable trace URLs for every query –
    Guardrails: Live blocking of prompt injection and PII
    leakage – Evaluation: Metrics proving answer quality
    and faithfulness – Professional engineering: Clean
    code, git history, documentation

    And if they ask “how did you get the Langfuse integration
    working?”

    I’ll tell them this story.


    The Git Commit That Ended It
    All

    git commit -m "feat: Add Langfuse trace URL to API response using last_trace_id
    
    - Import uuid for trace generation
    - Use CallbackHandler() to capture Langfuse traces
    - Extract trace URL using handler.last_trace_id
    - Build direct link to Langfuse dashboard for each query
    - Provides full observability of RAG pipeline execution"

    Commit: a1a7ce7


    Closing Thoughts

    If you’re building production GenAI systems, you’ll hit walls like
    this. The ecosystem is moving fast. Documentation lags. APIs change.
    Integrations are janky.

    The difference between a junior engineer and a senior engineer isn’t
    that the senior knows all the answers. It’s that the senior knows
    how to find the answers when the docs don’t have
    them
    .

    Use dir(). Use hasattr(). Print the damn
    attributes. Read error messages carefully. Try multiple approaches.
    Don’t assume the API works like you think it should.

    And when you finally get that trace_url field to
    populate with a real URL?

    Commit it. Document it. And add it to your interview portfolio.

    Because that’s the stuff that gets you hired.


    P.S. If you’re preparing for GenAI/ML engineering
    interviews and want to see the full code for this RAG system with
    guardrails, tracing, and evaluation, it’s all on GitHub:
    github.com/mtnjxynt6p-ai/brownbi_com. PRs welcome. Bugs
    expected. Observability guaranteed.

    P.P.S. Special shoutout to GitHub Copilot for
    helping me debug this in real-time. Even AI agents can’t magic away the
    need to inspect objects at runtime. But they sure make the journey
    faster.


    Tags: #GenAI #RAG #Langfuse #Observability
    #LangChain #Debugging #AWSBedrock #Python #FastAPI #InterviewPrep

    Reading time: ~10 minutes
    Debugging time saved: ~2 hours (if you learn from my
    mistakes)