ML Engineer - Large Language Model (LLM) Training

Computrabajo

Full-time

Onsite

No experience limit

No degree limit

Merida, Yucatan, Mexico

Favourites

Some content was automatically translatedView Original

Description

Job Summary: We are seeking an AI specialist passionate about algorithm optimization and developing Spanish-language text processing solutions to build an innovative technology solution. Key Highlights: 1. Foundational role with direct impact on architectural decisions 2. High autonomy and advanced technical challenge 3. Dedicated infrastructure and hardware for model training We are assembling a select team to build an innovative technology solution in Latin America. We develop frontier AI focused on Spanish-language text processing, addressing a real-world problem with customers ready for implementation. If you are passionate about algorithm optimization, code debugging, and watching a loss curve finally converge, you are the person we are looking for. You will design and execute the full pipeline: corpus preparation, Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), RLHF, quantization, and deployment on proprietary hardware. We offer an environment with high autonomy, responsibility, and an advanced technical challenge. Key Responsibilities: Prepare and tokenize large-scale Spanish-language text datasets. Perform Continual Pre-Training on open-source base models using dedicated GPU infrastructure. Conduct supervised fine-tuning (Fine-tuning) using LoRA and QLoRA within the HuggingFace and TRL ecosystems. Design and operate RLHF and DPO pipelines with domain annotators. Quantize the final model for on-premise deployment using GGUF and MLX on specific hardware. Build information retrieval systems (RAG) on pgvector. Design rigorous evaluation metrics for model validation. Essential Requirements: Advanced proficiency in Python. Proven hands-on experience with PyTorch and HuggingFace Transformers. Experience fine-tuning LLMs in production environments (SFT, LoRA, QLoRA). Fluent command-line usage of Linux environments. Experience managing large volumes of data (ETL processes, tokenization, and pipelines). Native or advanced operational (C2) proficiency in Spanish for text evaluation. Desirable Requirements: Knowledge of MLX for Apple Silicon. Experience with RLHF, DPO, and Reward Modeling. Familiarity with tools such as Unsloth, DeepSpeed, or FSDP. Knowledge of quantization techniques: GGUF, GPTQ, AWQ. Experience with pgvector or vector databases. Familiarity with llama.cpp and Ollama. We Offer: Competitive salary commensurate with demonstrated technical expertise. Benefits exceeding statutory requirements. 100% remote work arrangement. Dedicated infrastructure and hardware for model training. Foundational role with direct impact on architectural decisions. Flexible working hours based on objective achievement. Selection Process: Per platform policy, please apply directly via the button on this portal, ensuring your profile is up-to-date and includes your portfolio or links to relevant code repositories (e.g., fine-tuning or training projects) in your attached information. Demonstrable code will be evaluated in the early stages of the process.-Requirements- Minimum Education: Higher Education – Specialization 6 years of experience Languages: Spanish, English Age: 30 years or older Knowledge Areas: Self-supervision, Databases, Spanish, Hardware, Artificial Intelligence, Technology Solutions

Source: computrabajo View original post

Mateo García

Computrabajo

Company

Computrabajo

Mateo García

Computrabajo

Similar jobs

ML Engineer - Large Language Model (LLM) Training

Description

Company

Similar jobs

Sales Advisor // New Branch - Coming Soon, Macroplaza Mérida!

Sales Advisor - MacStore Retail Store

Landscape Architect

Corporate Sales Executive - Technology

Wastewater Treatment Plant Operator

Store Assistant Manager - MacStore La Isla