Hi, I'm having trouble with document analysis. I followed a basic workflow to verify everything, as outlined in the documentation, but I'm getting an error. The problem isn't with the document extractor, but with the LLMS. I do have a balance in my account, but it's not processing. Does anyone know what's happening? Image attached.
Hi everyone,
I am new at RAG systems and have a little problem. I am building a Q&A RAG system and my dataset is mostly youtube podcast transcripts. Despite adding more data and advanced pipeline the system cannot retrieve specific informations (e.g., analyses about specific companies or products mentioned in the podcasts). Mostly it says there is nothing about it in context or gives very shallow answers.
My current stack is.
I use Dify for the workflow
Data Prep: Raw YouTube transcripts. I used GPT-4o-mini to to generate summaries, and extract metadata tags for each file. And I add each metadata to dify.
Chunking: 1500 chunk size with 250 overlap.
Embedding: OpenAI text-embedding-3-large.
Retrieval Strategy: 2-pass retrieval. One search directly with the user's prompt, and another search where an LLM transforms/expands the prompt. I combine the results.
Generator LLM: DeepSeek R1.
Has anyone tackled retriaval from conversational/podcast data? Is there any recommendations? Thanks!
