You know what good code looks like — and what a confidently-wrong implementation looks like. AI companies need that judgement to teach models to build software well.
This isn’t advisory work and it isn’t labelling. You’re reviewing real engineering work and explaining your judgement clearly enough that an AI can learn from it — paid for your expertise, by the hour.
Applied Clinical Judgement connects qualified people to vetted platforms, and Sean Key personally vouches for those he refers. We’re paid a referral fee by the platform on a successful placement — never by you. The roles below are live today.
11 live Software Engineers roles · updated daily
Open Source Applied Engineer Talent Network
Mercor is hiring an open-source engineer at $100/hour to design coding evaluations, develop test cases, and analyse system performance across Python, Java, C, JavaScript, and TypeScript. This suits experienced contributors with a strong GitHub presence and demonstrated expertise in core programming fundamentals. You'll work asynchronously with a research team, identifying improvements and executing contributions independently using Git and CI/CD workflows.
Software Engineer
$66–$129 per hour. micro1 seeks software engineers with 3+ years' experience in Python, Rust, GoLang, Java, Node.js, or full-stack development to help train AI systems. You'll design scalable backend and full-stack applications, write clean code, optimise existing systems, and collaborate across distributed teams. Remote contract work; no prior AI experience required.
AI Evaluation Engineer (Python / Java / Web)
Turing seeks software engineers with 3–5 years' experience to design AI evaluation tasks for advanced language models. You'll create realistic Java and web development challenges, write reference solutions, and develop verification criteria that measure AI system capabilities. The role requires strong technical writing skills and deep understanding of software engineering best practices. This is a 2-month contractor position requiring 40 hours weekly.
AI Evaluation Engineer (Python / Java / Web)
Turing seeks experienced software engineers to design and validate AI evaluation benchmarks across Python, Java, and web technologies. You'll create realistic coding tasks, reference solutions, and verification criteria that test advanced AI system capabilities. The role requires five years' development experience, strong technical writing, and deep understanding of software engineering workflows. This is a two-month freelance contract with flexible remote work.
Senior Software Engineer – LLM Evaluation (US/Canada/WEU based)
Turing seeks senior software engineers to evaluate and improve large language models through code curation, review, and refinement across multiple languages. You'll assess AI-generated code for production readiness, design verification systems, and collaborate with research teams on frontier AI projects. Requires 3+ years' engineering experience and expertise in full-stack development.
Senior Software Engineer – LLM Evaluation
Turing seeks experienced software engineers to evaluate and refine AI-generated code across multiple languages for LLM training datasets. You'll curate code examples, assess model outputs for efficiency and reliability, and design verification mechanisms for software engineering tasks. Requires 2+ years full-time experience at top-tier product companies and deep expertise in full-stack development, architecture, and code quality assessment. Flexible contractor role, 10–40 hours weekly with partial PST overlap.
Software Engineer – AI Code Evaluation & Benchmarking (SWE-Bench)
Turing seeks experienced software engineers to evaluate and benchmark AI-generated code for large language models. You'll assess coding solutions, identify correctness issues, debug implementations, and build evaluation datasets. The role suits engineers with strong code review experience and deep software engineering expertise. Minimum 20 hours weekly with 4-hour PST overlap; one-month contractor assignment.
Software Engineer
Micro1 seeks experienced backend and full-stack software engineers on a contractor basis at $20–$75 per hour for 10–15 hours weekly. The role involves building and evaluating reinforcement learning environments to test AI systems' ability to identify and patch security vulnerabilities in code. Suited to developers with 3+ years' production experience, strong debugging skills, and familiarity with codebases across Python, JavaScript, Java, Go or Rust. Cybersecurity and SecOps backgrounds are preferred.
Software Engineer
Micro1 offers $20–$75 per hour for experienced software engineers to create reinforcement learning environments that test AI systems' ability to identify and patch security vulnerabilities. The role suits developers with 3+ years' backend or full-stack experience and preferably cybersecurity exposure. You'll inject known CVEs into codebases and build reproducible testing environments. Output-based compensation with minimum weekly task submissions required.
Competitive Coder
Earning $40–$80 per hour, this remote contractor role suits experienced competitive programmers. You'll design and implement checkers for programming problems, validate submissions against complex constraints, and develop robust C++ solutions. The work involves collaborating with platform teams, documenting logic clearly, and maintaining high code quality under tight deadlines on micro1.
Game Developer (Java / libGDX)
Contractor role paying $50–120 per hour on micro1. Game developers with Java and libGDX experience needed to build 2D game features and help train AI systems through high-quality interactive data. Portfolio or demo projects preferred. Fully remote, no prior AI experience required.
No live roles match your search.
AI training work is organised by profession, task and software — not by topic or sector. Try your field (for example “nursing” or “Python”), clear the filters, or browse the categories further down the page. The always-open talent pools below are a good place to start.
What the work looks like
Writing reference solutions
Build clean, correct implementations the model learns from.
Reviewing AI code
Judge which of two AI-written solutions is better, and debug where the model went wrong.
Writing the tests
Write the tests that catch the failure the model didn't see.
Common questions
How much does it pay?
Hourly and contractor-based, varying with seniority and role. Every role card shows its pay band. You invoice as an independent contractor and choose your hours.
Can I do this alongside my current job?
Yes — the work is flexible and part-time by design. Check your employer's policy on outside work first; ACJ can't advise on that.
Who is ACJ, and what's your part in this?
Applied Clinical Judgement is run by Sean Key. We connect qualified people to vetted AI-training platforms (Mercor, micro1, Turing), and Sean personally vouches for the people he refers. We're paid a referral fee by the platform on a successful placement — never by you.
How do I get started?
Find a role below that fits, and apply through the link — it carries Sean's referral. If you'd like him to vouch for you or talk it through first, book a short call.
Sean Key vouches for the people he refers
I’m Sean Key, editor of Applied Clinical Judgement. After 29 years in the NHS I help qualified professionals find legitimate, well-paid AI-training work — and I’ll personally vouch for you when you apply.
Applied Clinical Judgement is a referral intermediary, not an employer or recruiter. We refer candidates to third-party platforms (Mercor, micro1, Turing) and may earn a referral fee on a successful placement. We never charge candidates. Pay rates are set by the platforms and may change. PRAG-DEL-SOL-ONE LTD · Co. 07204925 · VAT 987-3626-64 · ICO ZC086000.
