ML Engineer - Language Model (LLM) Training

Computrabajo

Full-time

Onsite

No experience limit

No degree limit

Merida, Yucatan, 97000, Mexico

Favourites

Some content was automatically translatedView Original

Description

Job Summary: We are seeking an engineer passionate about optimizing algorithms and debugging code to build a cutting-edge AI solution in Latin America, designing and executing the full model development pipeline. Key Highlights: 1. Complete AI pipeline design and implementation 2. High autonomy and responsibility 3. Advanced technical challenge We are assembling a select team to build an innovative technological solution in Latin America. We develop frontier AI focused on Spanish-language text processing, addressing a real-world problem with customers ready for implementation. If you are passionate about algorithm optimization, code debugging, and watching a loss curve finally converge, you are the person we are looking for. You will design and execute the complete pipeline: corpus preparation, Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), RLHF, quantization, and deployment on proprietary hardware. We offer an environment characterized by high autonomy, responsibility, and advanced technical challenge. Key Responsibilities: Prepare and tokenize large-scale Spanish text datasets. Perform Continual Pre-Training on open-source base models using dedicated GPU infrastructure. Conduct supervised fine-tuning (Fine-tuning) with LoRA and QLoRA using the HuggingFace and TRL ecosystems. Design and operate RLHF and DPO pipelines with domain annotators. Quantize the final model for on-premise deployment using GGUF and MLX on specific hardware. Build information retrieval systems (RAG) on pgvector. Design rigorous evaluation metrics for model validation. Essential Requirements: Advanced proficiency in Python. Proven hands-on experience with PyTorch and HuggingFace Transformers. Experience fine-tuning LLMs in production environments (SFT, LoRA, QLoRA). Fluent command-line usage of Linux environments. Experience managing large-scale data (ETL processes, tokenization, and pipelines). Native or advanced operational proficiency (C2 level) in Spanish for text evaluation. Desirable Requirements: Knowledge of MLX for Apple Silicon. Experience with RLHF, DPO, and Reward Modeling. Familiarity with tools such as Unsloth, DeepSpeed, or FSDP. Knowledge of quantization techniques: GGUF, GPTQ, AWQ. Experience with pgvector or vector databases. Familiarity with llama.cpp and Ollama. We Offer: Competitive salary commensurate with demonstrated technical expertise. Benefits exceeding statutory requirements. 100% remote work arrangement. Dedicated infrastructure and hardware for model training. Foundational role with direct impact on architectural decisions. Flexible working hours based on objective delivery. Selection Process: Per platform policies, please apply directly via the button on this portal, ensuring your profile is up to date and includes your portfolio or links to relevant code repositories (e.g., fine-tuning or training projects) in your attached information. Demonstrable code will be evaluated in the early stages of the process.-Requirements- Minimum Education: Higher education – Specialization 6 years of experience Languages: Spanish, English Age: 30 years or older Knowledge: Self-supervision, Database, Spanish, Hardware, Artificial Intelligence, Technological Solutions

Source: computrabajo View original post

Mateo García

Computrabajo

Company

Computrabajo

Mateo García

Computrabajo

Similar jobs

ML Engineer - Language Model (LLM) Training

Description

Company

Similar jobs

Sales Advisor - La Isla Mérida / Premium Benefits

GERMICIDAL PROCESSES SUPERVISOR

Counter Salesperson

Sales Advisor - MacStore Store

Wastewater Treatment Plant Operator

Backend and DevOps Engineer - AI Product (Python/FastAPI/PostgreSQL/React)