Senior Software Engineer – LLM Evaluation
Contract Type: Contractor
Experience Required: 5+ Years
Location: United States & Western Europe (France, Germany, Switzerland, Denmark, Finland, Netherlands, Sweden, Iceland, Italy, Austria, Ireland, Norway)
Role Overview
As a Senior Software Engineer – LLM Evaluation , you will play a critical role in advancing large language models by creating high-quality datasets, evaluating AI-generated code, and collaborating closely with AI researchers. You will help ensure that AI-driven coding solutions meet enterprise-grade standards for efficiency, scalability, and reliability.
Key Responsibilities
- Evaluate, refine, and improve AI-generated code to ensure performance, scalability, and production readiness.
- Collaborate with cross-functional teams to benchmark AI-driven coding solutions against industry standards.
- Design and build automated agents to verify code quality and identify recurring error patterns.
- Assess AI model capabilities across the full software engineering lifecycle, including prototyping, system architecture, API design, production implementation, testing, deployment, monitoring, and maintenance.
- Develop verification mechanisms that automatically validate software engineering solutions.
Required Skills & Qualifications
- 5+ years of professional software engineering experience.
- Minimum 2+ years of continuous full-time experience at a top-tier product or technology company (e.g., Google, Amazon, Apple, Meta, Microsoft, Netflix, Stripe, Datadog, Shopify, PayPal, IBM Research).
- Strong expertise in full-stack development and building scalable, production-grade applications.
- Deep understanding of software architecture, system design, debugging, and code quality evaluation.
- Excellent written and verbal communication skills with the ability to provide clear, structured evaluation feedback.
Vetting Process
- Complete the Initial Candidate Form (ICF) :
- Participate in a 20-minute AI technical interview via Turing’s AI interview platform (QODE).
- Complete Vetsmith automated coding challenge (30–45 minutes):
Preferred Background
Candidates with experience at leading technology and product organizations such as Google, Amazon, Apple, Meta, Microsoft, Netflix, Stripe, Databricks, Snowflake, Cloudflare, Uber, Airbnb, Shopify, PayPal, ServiceNow, Hugging Face, Atlassian, and similar high-scale engineering environments are highly preferred.
#J-18808-Ljbffr
Solliciteren