Model track: Start with gpt-3.5-turbo.
Governance: Add contract-level access controls and audit logging for document uploads.
Source defaults observed in repo: chunk 1024, overlap 200, retriever k 20, model gpt-3.5-turbo.
LCQA component: Chroma vector retrieval
Embeds contract chunks and retrieves top-k context with persistent Chroma storage.
LCQA component: legal answer chain
Prompt + LLM chain that returns concise legal-contract answers and abstains when context is insufficient.
LCQA component: PDF text extraction
Document ingestion layer using PyPDF2 to extract text across all uploaded PDF pages.
Phase 2: Retrieval and QA
SourceLCQA component: recursive text chunking
Chunking layer built with RecursiveCharacterTextSplitter for retrieval-ready segments.
Phase 2: Retrieval and QA
SourceLCQA component: Streamlit interface
Interactive UI for uploading contracts and querying the assistant.
Phase 2: Stack hardening
SourceLCQA dependency: ChromaDB
Core dependency used in the Legal Contract Q&A Bot stack (chromadb==0.4.22).
Phase 2: Stack hardening
SourceLCQA dependency: LangChain
Core dependency used in the Legal Contract Q&A Bot stack (langchain==0.1.20).
Phase 2: Stack hardening
SourceLCQA dependency: LangChain OpenAI
Core dependency used in the Legal Contract Q&A Bot stack (langchain-openai==0.1.5).
Phase 3: Eval and optimization
SourceLCQA roadmap task 1
Research current state-of-the-art approaches in contract analysis and legal AI.
Phase 3: Eval and optimization
SourceLCQA roadmap task 2
Implement a basic Q&A pipeline that utilizes retrieval-augmented generation (RAG) for answering contract-related queries.
Phase 3: Eval and optimization
SourceLCQA roadmap task 3
Build and fine-tune a specialized evaluation framework to assess the bot's performance on legal-specific tasks.
Phase 3: Eval and optimization
SourceLCQA roadmap task 4
Explore optimization techniques to enhance the accuracy and reliability of the Q&A responses.
Phase 4: Reporting and rollout
SourceLCQA roadmap task 5
Deploy enhancements to the pipeline, focusing on context understanding and response precision.
Phase 4: Reporting and rollout
SourceLCQA roadmap task 6
Interpret the bot's outputs and compile detailed performance reports to guide further improvements.