Switching to one-call per skill

This commit is contained in:
James Ketr 2025-05-12 16:57:20 -07:00
parent 4e772ab8ea
commit a1798b58ac
3 changed files with 78 additions and 125 deletions

View File

@ -4,13 +4,12 @@ The system follows a carefully designed pipeline with isolated stages to prevent
The system uses a pipeline of isolated analysis and generation steps: The system uses a pipeline of isolated analysis and generation steps:
1. **Stage 1: Isolated Analysis** (three sub-stages) 1. **Stage 1: Isolated Analysis**
- **1A: Job Analysis** - Extracts requirements from job description only - **1A: Job Analysis** - Extracts requirements from job description only
- **1B: Candidate Analysis** - Catalogs qualifications from resume/context only - **1B: Skill-Based Assessment** - For each required skill, determine a Individisual Skill Assessment, adding it to a Skill Assessments Collection.
- **1C: Mapping Analysis** - Identifies legitimate matches between requirements and qualifications
2. **Stage 2: Resume Generation** 2. **Stage 2: Resume Generation**
- Uses mapping output to create a tailored resume with evidence-based content - Uses Skills Asessments Collection to generate a tailored resume.
3. **Stage 3: Verification** 3. **Stage 3: Verification**
- Performs fact-checking to catch any remaining fabrications - Performs fact-checking to catch any remaining fabrications
@ -23,63 +22,62 @@ flowchart TD
A2 --> A3[Job Requirements JSON] A2 --> A3[Job Requirements JSON]
end end
subgraph "Stage 1B: Candidate Analysis" subgraph "Stage 1B: Skill-Based Assessment"
B1[Resume Input] --> B5[Candidate Analysis LLM] B1[Resume Input] --> B2[Candidate Info]
B5 --> B4[Candidate Qualifications JSON] B2 --> B3[RAG System]
B2[Candidate Info] --> B3[RAG] A3 --> B4[Skill Assessment Generator]
B3[RAG] --> B2[Candidate Info] B3 --> B4
A3[Job Requirements JSON] --> B3[RAG] B4 --> B5{For Each Required Skill}
B3[RAG] --> B5 B5 --> B6[Skill-Focused LLM Query]
end B6 --> B7[Individual Skill Assessment]
B7 --> B8[Skill Assessments Collection]
subgraph "Stage 1C: Mapping Analysis"
C1[Job Requirements JSON] --> C3[Mapping Analysis LLM]
C2[Candidate Qualifications JSON] --> C3
C3 --> C4[Skills Mapping JSON]
end end
end end
subgraph "Stage 2: Resume Generation" subgraph "Stage 2: Resume Generation"
D1[Skills Mapping JSON] --> D3[Resume Generation LLM] C1[Skill Assessments Collection] --> C2[Resume Generator]
D2[Original Resume Reference] --> D3 C3[Original Resume Reference] --> C2
D3 --> D4[Tailored Resume Draft] C4[Candidate Information] --> C2
C2 --> C5[Resume Generation Prompt]
C5 --> C6[Resume Generation LLM]
C6 --> C7[Tailored Resume Draft]
end end
subgraph "Stage 3: Verification" subgraph "Stage 3: Statistics & Verification"
E1[Skills Mapping JSON] --> E2[Original Materials] D1[Job Requirements JSON] --> D2[Match Statistics Calculator]
E2 --> E3[Tailored Resume Draft] D3[Skill Assessments Collection] --> D2
E3 --> E4[Verification LLM] D2 --> D4[Match Statistics]
E4 --> E5{Verification Check} D4 --> D5[Verification LLM]
E5 -->|PASS| E6[Approved Resume] C7 --> D5
E5 -->|FAIL| E7[Correction Instructions] D5 --> D6{Verification Check}
E7 --> D3 D6 -->|PASS| D7[Approved Resume]
D6 -->|FAIL| D8[Correction Instructions]
D8 --> C2
end end
A3 --> C1 A3 --> B4
B4 --> C2 B8 --> C1
C4 --> D1 B8 --> D3
C4 --> E1 B1 --> C3
D4 --> E3
style A2 fill:#f9d77e,stroke:#333,stroke-width:2px style A2 fill:#f9d77e,stroke:#333,stroke-width:2px
style B5 fill:#f9d77e,stroke:#333,stroke-width:2px style B6 fill:#f9d77e,stroke:#333,stroke-width:2px
style C3 fill:#f9d77e,stroke:#333,stroke-width:2px style C6 fill:#f9d77e,stroke:#333,stroke-width:2px
style D3 fill:#f9d77e,stroke:#333,stroke-width:2px style D5 fill:#f9d77e,stroke:#333,stroke-width:2px
style E4 fill:#f9d77e,stroke:#333,stroke-width:2px style B5 fill:#a3e4d7,stroke:#333,stroke-width:2px
style E5 fill:#a3e4d7,stroke:#333,stroke-width:2px style D6 fill:#a3e4d7,stroke:#333,stroke-width:2px
style E6 fill:#aed6f1,stroke:#333,stroke-width:2px style D7 fill:#aed6f1,stroke:#333,stroke-width:2px
style E7 fill:#f5b7b1,stroke:#333,stroke-width:2px style D8 fill:#f5b7b1,stroke:#333,stroke-width:2px
``` ```
## Stage 1: Isolated Analysis (three separate sub-stages) ## Stage 1: Isolated Analysis
1. **Job Analysis**: Extracts requirements from just the job description 1. **Job Analysis**: Extracts requirements from just the job description
2. **Candidate Analysis**: Catalogs qualifications from just the resume/context 2. **Candidate Analysis**: Catalogs qualifications for each job requirement from just the resume/context
3. **Mapping Analysis**: Identifies legitimate matches between requirements and qualifications
## Stage 2: Resume Generation ## Stage 2: Resume Generation
Creates a tailored resume using only verified information from the mapping Creates a tailored resume using the skills collection and candidate information.
## Stage 3: Verification ## Stage 3: Verification
@ -90,7 +88,7 @@ Creates a tailored resume using only verified information from the mapping
The system uses several techniques to prevent fabrication: The system uses several techniques to prevent fabrication:
* **Isolation of Analysis Stages**: By analyzing the job and candidate separately, the system prevents the LLM from prematurely creating connections that might lead to fabrication. * **Isolation of Analysis Stages**: By analyzing the job and candidate separately, and having the LLM only provide evidence of a single skill per pass, the system prevents the LLM from prematurely creating connections that might lead to fabrication.
* **Evidence Requirements**: Each qualification included must have explicit evidence from the original materials. * **Evidence Requirements**: Each qualification included must have explicit evidence from the original materials.
* **Conservative Transferability**: The system is instructed to be conservative when claiming skills are transferable. * **Conservative Transferability**: The system is instructed to be conservative when claiming skills are transferable.
* **Verification Layer**: A dedicated verification step acts as a safety check to catch any remaining fabrications. * **Verification Layer**: A dedicated verification step acts as a safety check to catch any remaining fabrications.

View File

@ -436,11 +436,11 @@ class JobDescription(Agent):
# Group results by category and subcategory # Group results by category and subcategory
grouped_context = defaultdict(list) grouped_context = defaultdict(list)
for result in rag_results: for result in rag_results:
key = f"{result['category']}/{result['subcategory']}".strip("/") key = f"{result['category']}/{result['subcategory']}".strip("/")
grouped_context[key].append({ grouped_context[key].append({
"query": result["context"], "query": result["context"],
"content": result["content"][:100] + "..." if len(result["content"]) > 100 else result["content"] "content": result["content"][:100] + "..." if len(result["content"]) > 100 else result["content"]
}) })
# Format as a structured string # Format as a structured string
context_lines = ["Additional Context from Document Retrieval:"] context_lines = ["Additional Context from Document Retrieval:"]
@ -454,120 +454,70 @@ class JobDescription(Agent):
# Stage 1B: Candidate Analysis Implementation # Stage 1B: Candidate Analysis Implementation
def create_candidate_analysis_prompt(self, resume: str, rag_results: List[Dict[str, Any]]) -> tuple[str, str]: def create_candidate_analysis_prompt(self, resume: str, rag_results: List[Dict[str, Any]]) -> tuple[str, str]:
"""Create the prompt for candidate qualifications analysis.""" """Create the prompt for candidate qualifications analysis."""
system_prompt = """\
# system_prompt = """ You are an objective resume analyzer. Create a concise inventory of the candidate's key skills, experiences, and qualifications based on their resume.
# You are an objective resume analyzer. Create a comprehensive inventory of all skills, experiences, and qualifications present in the candidate's materials.
# CORE PRINCIPLES:
# - Analyze ONLY the candidate's resume and provided context
# - Focus ONLY on the candidate's actual qualifications
# - Do not reference any job requirements
# - Include only explicitly mentioned information
# OUTPUT FORMAT:
# ```json
# {
# "candidate_qualifications": {
# "technical_skills": [
# {
# "skill": "skill name",
# "evidence_location": "where in resume this appears",
# "expertise_level": "stated level or 'unspecified'"
# }
# ],
# "work_experience": [
# {
# "role": "job title",
# "company": "company name",
# "duration": "time period",
# "responsibilities": ["resp1", "resp2"],
# "technologies_used": ["tech1", "tech2"],
# "achievements": ["achievement1", "achievement2"]
# }
# ],
# "education": [
# {
# "degree": "degree name",
# "institution": "institution name",
# "completed": true/false,
# "graduation_date": "date or 'ongoing'"
# }
# ],
# "projects": [
# {
# "name": "project name",
# "description": "brief description",
# "technologies_used": ["tech1", "tech2"]
# }
# ],
# "soft_skills": [
# {
# "skill": "skill name",
# "context": "brief mention of where this appears"
# }
# ]
# }
# }
# """
system_prompt = """\
You are an objective resume analyzer. Create a comprehensive inventory of all skills, experiences, and qualifications present in the candidate's materials.
CORE PRINCIPLES: CORE PRINCIPLES:
- Analyze ONLY the candidate's resume and provided context. - Analyze ONLY the candidate's resume and provided context.
- Focus ONLY on the candidate's actual qualifications explicitly mentioned in the resume. - Focus on the most significant and relevant qualifications explicitly mentioned.
- Use the additional context to clarify or provide background for terms, skills, or experiences mentioned in the resume (e.g., to understand the scope of a skill like 'Python' or a role's responsibilities). - Limit your analysis to the most important items in each category.
- Do NOT treat the context as job requirements or infer qualifications not explicitly stated in the resume. - Prioritize brevity and completeness over exhaustiveness.
- Include only explicitly mentioned information from the resume, supplemented by context where relevant. - Complete the entire analysis in one response without getting stuck on any section.
OUTPUT FORMAT: OUTPUT FORMAT:
```json
{ {
"candidate_qualifications": { "candidate_qualifications": {
"technical_skills": [ "technical_skills": [
// Include MAX 10 most important technical skills
{ {
"skill": "skill name", "skill": "skill name",
"evidence_location": "where in resume this appears", "evidence_location": "brief reference",
"expertise_level": "stated level or 'unspecified'" "expertise_level": "stated level or 'unspecified'"
} }
], ],
"work_experience": [ "work_experience": [
// Include MAX 5 most recent or relevant positions
{ {
"role": "job title", "role": "job title",
"company": "company name", "company": "company name",
"duration": "time period", "duration": "time period",
"responsibilities": ["resp1", "resp2"], "responsibilities": ["resp1", "resp2"], // MAX 3 key responsibilities
"technologies_used": ["tech1", "tech2"], "technologies_used": ["tech1", "tech2"], // MAX 5 technologies
"achievements": ["achievement1", "achievement2"] "achievements": ["achievement1"] // MAX 2 achievements
} }
], ],
"education": [ "education": [
// Include ALL education entries (typically 1-3)
{ {
"degree": "degree name", "degree": "degree name",
"institution": "institution name", "institution": "institution name",
"completed": true/false, "completed": true/false
"graduation_date": "date or 'ongoing'"
} }
], ],
"projects": [ "projects": [
// Include MAX 3 most significant projects
{ {
"name": "project name", "name": "project name",
"description": "brief description", "description": "one sentence description",
"technologies_used": ["tech1", "tech2"] "technologies_used": ["tech1", "tech2"] // MAX 3 technologies
} }
], ],
"soft_skills": [ "soft_skills": [
// Include MAX 5 most prominent soft skills
{ {
"skill": "skill name", "skill": "skill name",
"context": "brief mention of where this appears" "context": "brief mention"
} }
] ]
} }
} }
IMPORTANT: If at any point you find yourself repeating items or getting stuck, STOP that section and move to the next. It's better to provide a partial analysis than to get stuck in a loop.
""" """
context = self.format_rag_context(rag_results) context = self.format_rag_context(rag_results)
prompt = f"Resume:\n{resume}\n\nAdditional Context:\n{context}" prompt = f"Resume:\n{resume}\n\nAdditional Context:\n{context}"
return system_prompt, prompt return system_prompt, prompt
async def call_llm(self, message: Message, system_prompt, prompt, temperature=0.7): async def call_llm(self, message: Message, system_prompt, prompt, temperature=0.7):
logger.info(f"{self.agent_type} - {inspect.stack()[0].function}") logger.info(f"{self.agent_type} - {inspect.stack()[0].function}")

View File

@ -167,6 +167,7 @@ class ChromaDBFileWatcher(FileSystemEventHandler):
if os.path.isfile(file_path): if os.path.isfile(file_path):
# Do not put the Resume in RAG as it is provideded with all queries. # Do not put the Resume in RAG as it is provideded with all queries.
if file_path == defines.resume_doc: if file_path == defines.resume_doc:
logging.info(f"Not adding {file_path} to RAG -- primary resume")
continue continue
files_checked += 1 files_checked += 1
current_hash = self._get_file_hash(file_path) current_hash = self._get_file_hash(file_path)
@ -218,6 +219,10 @@ class ChromaDBFileWatcher(FileSystemEventHandler):
logging.info(f"{file_path} already in queue. Not adding.") logging.info(f"{file_path} already in queue. Not adding.")
return return
if file_path == defines.resume_doc:
logging.info(f"Not adding {file_path} to RAG -- primary resume")
return
try: try:
logging.info(f"{file_path} not in queue. Adding.") logging.info(f"{file_path} not in queue. Adding.")
self.processing_files.add(file_path) self.processing_files.add(file_path)