AI Engineer – Prompt Evaluation

Team Red Dog - Redmond, WA, United States

Team Red Dog is partnering with a global productivity and collaboration leader to hire an AI Engineer – Prompt Evaluation to support Copilot experiences across Word, Excel, and PowerPoint. This onsite role in Redmond offers hands-on work in LLM prompt evaluation, synthetic data generation, and automation at Office scale, directly influencing how millions of users interact with AI-powered productivity tools.

Top Required Skills (Must Haves):

  1. Experience setting up synthetic tenant data and data ingestion, including test accounts, grounding data generation, and configuration-as-code for evaluation environments.
  2. Hands-on experience maintaining, validating, and automating test datasets for an LLM evaluation system, with a focus on quality and repeatability.
  3. Ability to integrate evaluation quality checks into build and deployment pipelines to ensure performance, efficiency, and scalability.
  4. Strong coding or scripting skills (Python highly preferred; C# acceptable) used to support testing, automation, and evaluation workflows.

Opportunity Overview:
This role sits at the intersection of AI engineering, experimentation, and large-scale product delivery. You will help build net-new evaluation capabilities for Copilot within Office applications, contributing to a rapidly evolving area of LLM prompt evaluation. The work directly supports shipping Copilot features at massive scale, offering rare exposure to real-world AI systems used daily by millions of users.

How you will make an impact:
• Support evaluation of Suggested User Actions across Word, Excel, PowerPoint, and other host applications
• Set up synthetic tenant data, ingestion pipelines, and grounding datasets for evaluation scenarios
• Create and maintain evaluation test sets and configurations as code
• Run evaluations, inspect results, and iterate with partner engineering and product teams
• Automate the creation and validation of test datasets for LLM evaluation systems
• Integrate evaluation quality checks into build and deployment pipelines
• Perform hands-on and hands-off validation using internal evaluation toolsets
• Build reporting pipelines to surface evaluation results and automate portions of the evaluation process

The expertise you bring:
• Bachelor’s degree in computer science, computer engineering, or a related technical field
• 2–4 years of professional experience in software engineering, data science, or experimentation-focused roles
• Strong foundation in computer science fundamentals, including data structures, algorithms, and software design
• Experience with large systems software development and testing
• Proven ability to troubleshoot, unit test, and validate both new and legacy systems
• Programming experience with demonstrated problem diagnosis and resolution skills

What makes a candidate highly successful in this role:
Successful candidates bring prior experience with LLM prompt engineering and evaluation, synthetic data generation, and experimentation workflows. They are comfortable working across testing, automation, and coding tasks, can reason about evaluation quality at scale, and collaborate effectively with partner teams to iterate quickly based on results.

Why Work with Team Red Dog?
At Team Red Dog, people are at the heart of everything we do. Our commitment to personalized service and our deep experience in matching talented professionals with meaningful roles at some of the world’s most inspiring companies is what sets us apart. We take the time to understand your unique skills, strengths, and passions—because we believe your career should reflect who you are.

Whether you're looking to grow, pivot, or simply find a place where your work truly matters, we offer opportunities that empower you to make a positive impact. With excellent benefits, a supportive team, and a role where you can thrive while doing what you love, we’re here to help you take the next step with confidence. Join us—and discover what it means to be genuinely valued in your career.

Generous benefits package for qualified employees includes:
• Health insurance (medical, dental, vision, and life)
• Employer-matched 401K plan
• Paid time off
• Paid holidays
• Profit sharing

Estimated Start Date: January 1, 2026
Location: Onsite – Redmond, WA
Job #: 2427
Job Type and Estimated Duration: W2/Contract, 40 hours per week, through 6/30/2026

Rate: $72 – $79/hour

 

Team Red Dog is committed to providing equal opportunities to everyone, regardless of race, ethnicity, gender, age, religion, sexual orientation, disability, or any other characteristic. If you need accommodation during the recruitment process, reach out to hr@teamreddog.com, and we will work to ensure an accessible experience. We strictly adhere to federal, state, and local laws to maintain a workplace free from discrimination and harassment.
We offer competitive compensation aligned with U.S. industry standards, and our final offer will reflect the candidate’s location, job-specific skills, experience, and knowledge.

• All applicants must be authorized to work in the U.S. without the need for sponsorship.
• Team Red Dog is an E-Verify employer.
• Employment is contingent upon the successful completion of a reference and background check.
• Please no solicitations from C2C or recruiting firms.

 



Posted On: Monday, December 22, 2025



Apply to this job

or