




**Job Description: AI Task Evaluation \& Statistical Analysis Specialist** ========================================================================== **Role Overview** ----------------- We're seeking a data\-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance\-sector tasks. You'll identify patterns, root causes, and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types, file types, criteria, etc.). **Key Responsibilities** ------------------------ * **Statistical Failure Analysis**: Identify patterns in AI agent failures across task components (prompts, rubrics, templates, file types, tags) * **Root Cause Analysis**: Determine whether failures stem from task design, rubric clarity, file complexity, or agent limitations * **Dimension Analysis**: Analyze performance variations across finance sub\-domains, file types, and task categories * **Reporting \& Visualization**: Create dashboards and reports highlighting failure clusters, edge cases, and improvement opportunities * **Quality Framework**: Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings * **Stakeholder Communication**: Present insights to data labeling experts and technical teams **Required Qualifications** --------------------------- * **Statistical Expertise**: Strong foundation in statistical analysis, hypothesis testing, and pattern recognition * **Programming**: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis * **Data Analysis**: Experience with exploratory data analysis and creating actionable insights from complex datasets * **AI/ML Familiarity**: Understanding of LLM evaluation methods and quality metrics * **Tools**: Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL **Preferred Qualifications** ---------------------------- * Experience with AI/ML model evaluation or quality assurance * Background in finance or willingness to learn finance domain concepts * Experience with multi\-dimensional failure analysis * Familiarity with benchmark datasets and evaluation frameworks * 2\-4 years of relevant experience We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request. **Contract and Payment Terms** ------------------------------ * You will be engaged as an independent contractor. * This is a fully remote role that can be completed on your own schedule. * Projects can be extended, shortened, or concluded early depending on needs and performance. * Your work will not involve access to confidential or proprietary information from any employer, client, or institution. * Payments are weekly on Stripe or Wise based on services rendered.


