···
Log in / Register
PhD Rater - Remote
Indeed
Full-time
Onsite
No experience limit
No degree limit
Mexico
Favourites
Share
Description

Summary: Seeking experienced researchers and technical experts to design and validate challenging benchmark tasks in various STEM fields for evaluating frontier models. Highlights: 1. Design challenging, real-world STEM problems 2. Fully remote role with flexible scheduling 3. Implement tasks using Python Seeking **experienced researchers and technical experts** to support a frontier\-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in **data science, machine learning, finance, and coding** to help identify reasoning and problem\-solving gaps in advanced STEM models. The role involves building real\-world tasks with executable tests and analyzing model or agent behavior. **Key Responsibilities** ------------------------ * Design challenging, real\-world STEM problems * Implement each task within an agentic development environment using **Python** **Contract and Payment Terms** ------------------------------ + You will be engaged as an independent contractor. + This is a fully remote role that can be completed on your own schedule. + Projects can be extended, shortened, or concluded early depending on needs and performance. + Payments are weekly on Stripe or Wise based on services rendered.

Source:  indeed View original post
Juan García
Indeed · HR

Company

Indeed
Juan García
Indeed · HR
Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.