'; }

Software Engineers

You know what good code looks like — and what a confidently-wrong implementation looks like. AI companies need that judgement to teach models to build software well.

This isn’t advisory work and it isn’t labelling. You’re reviewing real engineering work and explaining your judgement clearly enough that an AI can learn from it — paid for your expertise, by the hour.

Applied Clinical Judgement connects qualified people to vetted platforms, and Sean Key personally vouches for those he refers. We’re paid a referral fee by the platform on a successful placement — never by you. The roles below are live today.


11 live Software Engineers roles · updated daily

Mercor$100.0-$150.0 / hourly

Open Source Applied Engineer Talent Network

Global · remote

Mercor is hiring an open-source engineer at $100/hour to design coding evaluations, develop test cases, and analyse system performance across Python, Java, C, JavaScript, and TypeScript. This suits experienced contributors with a strong GitHub presence and demonstrated expertise in core programming fundamentals. You'll work asynchronously with a research team, identifying improvements and executing contributions independently using Git and CI/CD workflows.

PythonJavaCJavaScriptTypeScript+7
View role & apply
Micro1$66-$129 / hr

Software Engineer

USAUKCanadaAustralia+1 more

$66–$129 per hour. micro1 seeks software engineers with 3+ years' experience in Python, Rust, GoLang, Java, Node.js, or full-stack development to help train AI systems. You'll design scalable backend and full-stack applications, write clean code, optimise existing systems, and collaborate across distributed teams. Remote contract work; no prior AI experience required.

PythonRustGoLangJavaNode.js+7
View role & apply
TuringFrom 40 hrs/week

AI Evaluation Engineer (Python / Java / Web)

Bachelor's
Global · remote

Turing seeks software engineers with 3–5 years' experience to design AI evaluation tasks for advanced language models. You'll create realistic Java and web development challenges, write reference solutions, and develop verification criteria that measure AI system capabilities. The role requires strong technical writing skills and deep understanding of software engineering best practices. This is a 2-month contractor position requiring 40 hours weekly.

JavaWeb DevelopmentPythonObject-oriented programmingAPIs+5
View role & apply
TuringFrom 40 hrs/week

AI Evaluation Engineer (Python / Java / Web)

Bachelor's
Global · remote

Turing seeks experienced software engineers to design and validate AI evaluation benchmarks across Python, Java, and web technologies. You'll create realistic coding tasks, reference solutions, and verification criteria that test advanced AI system capabilities. The role requires five years' development experience, strong technical writing, and deep understanding of software engineering workflows. This is a two-month freelance contract with flexible remote work.

PythonJavaJVMWeb developmentsoftware engineering+5
View role & apply
TuringFrom 10 hrs/week

Senior Software Engineer – LLM Evaluation (US/Canada/WEU based)

Bachelor's
USACanadaATBE+8 more

Turing seeks senior software engineers to evaluate and improve large language models through code curation, review, and refinement across multiple languages. You'll assess AI-generated code for production readiness, design verification systems, and collaborate with research teams on frontier AI projects. Requires 3+ years' engineering experience and expertise in full-stack development.

PythonJavaScriptReactJSC/C++Java+7
View role & apply
TuringFrom 10 hrs/week

Senior Software Engineer – LLM Evaluation

Bachelor's
Global · remote

Turing seeks experienced software engineers to evaluate and refine AI-generated code across multiple languages for LLM training datasets. You'll curate code examples, assess model outputs for efficiency and reliability, and design verification mechanisms for software engineering tasks. Requires 2+ years full-time experience at top-tier product companies and deep expertise in full-stack development, architecture, and code quality assessment. Flexible contractor role, 10–40 hours weekly with partial PST overlap.

PythonJavaScriptReactJSC/C++Java+7
View role & apply
TuringFrom 20 hrs/week

Software Engineer – AI Code Evaluation & Benchmarking (SWE-Bench)

Bachelor's
Global · remote

Turing seeks experienced software engineers to evaluate and benchmark AI-generated code for large language models. You'll assess coding solutions, identify correctness issues, debug implementations, and build evaluation datasets. The role suits engineers with strong code review experience and deep software engineering expertise. Minimum 20 hours weekly with 4-hour PST overlap; one-month contractor assignment.

PythonJavaC/C++GoSwift+10
View role & apply
Micro1From 10 hrs/week$20-$75 / hr

Software Engineer

Global · remote

Micro1 seeks experienced backend and full-stack software engineers on a contractor basis at $20–$75 per hour for 10–15 hours weekly. The role involves building and evaluating reinforcement learning environments to test AI systems' ability to identify and patch security vulnerabilities in code. Suited to developers with 3+ years' production experience, strong debugging skills, and familiarity with codebases across Python, JavaScript, Java, Go or Rust. Cybersecurity and SecOps backgrounds are preferred.

PythonJavaScriptTypeScriptNode.jsJava+22
View role & apply
Micro1From 10 hrs/week$20-$75 / hr

Software Engineer

Global · remote

Micro1 offers $20–$75 per hour for experienced software engineers to create reinforcement learning environments that test AI systems' ability to identify and patch security vulnerabilities. The role suits developers with 3+ years' backend or full-stack experience and preferably cybersecurity exposure. You'll inject known CVEs into codebases and build reproducible testing environments. Output-based compensation with minimum weekly task submissions required.

PythonJavaScriptTypeScriptJavaC+++12
View role & apply
Micro1From 15 hrs/week$40-$80 / hr

Competitive Coder

Global · remote

Earning $40–$80 per hour, this remote contractor role suits experienced competitive programmers. You'll design and implement checkers for programming problems, validate submissions against complex constraints, and develop robust C++ solutions. The work involves collaborating with platform teams, documenting logic clearly, and maintaining high code quality under tight deadlines on micro1.

C++Competitive ProgrammingProblem analysisCode validationChecker implementation
View role & apply
Micro1$50-$120 / hr

Game Developer (Java / libGDX)

Global · remote

Contractor role paying $50–120 per hour on micro1. Game developers with Java and libGDX experience needed to build 2D game features and help train AI systems through high-quality interactive data. Portfolio or demo projects preferred. Fully remote, no prior AI experience required.

JavalibGDXgame developmentobject-oriented programmingsprite management+4
View role & apply

What the work looks like

Writing reference solutions

Build clean, correct implementations the model learns from.

Reviewing AI code

Judge which of two AI-written solutions is better, and debug where the model went wrong.

Writing the tests

Write the tests that catch the failure the model didn't see.

Common questions

How much does it pay?

Hourly and contractor-based, varying with seniority and role. Every role card shows its pay band. You invoice as an independent contractor and choose your hours.

Can I do this alongside my current job?

Yes — the work is flexible and part-time by design. Check your employer's policy on outside work first; ACJ can't advise on that.

Who is ACJ, and what's your part in this?

Applied Clinical Judgement is run by Sean Key. We connect qualified people to vetted AI-training platforms (Mercor, micro1, Turing), and Sean personally vouches for the people he refers. We're paid a referral fee by the platform on a successful placement — never by you.

How do I get started?

Find a role below that fits, and apply through the link — it carries Sean's referral. If you'd like him to vouch for you or talk it through first, book a short call.

Sean Key vouches for the people he refers

I’m Sean Key, editor of Applied Clinical Judgement. After 29 years in the NHS I help qualified professionals find legitimate, well-paid AI-training work — and I’ll personally vouch for you when you apply.

Book a vouch call · Sean on LinkedIn

Applied Clinical Judgement is a referral intermediary, not an employer or recruiter. We refer candidates to third-party platforms (Mercor, micro1, Turing) and may earn a referral fee on a successful placement. We never charge candidates. Pay rates are set by the platforms and may change. PRAG-DEL-SOL-ONE LTD · Co. 07204925 · VAT 987-3626-64 · ICO ZC086000.

Last Reviewed: