You already know what good Python looks like — and what a confidently-wrong implementation looks like. AI companies need that judgement.
This isn’t advisory work and it isn’t labelling. You’re reviewing real engineering work and explaining your judgement clearly enough that an AI can learn from it — paid for your expertise, by the hour.
Applied Clinical Judgement connects qualified people to vetted platforms, and Sean Key personally vouches for those he refers. We’re paid a referral fee by the platform on a successful placement — never by you. The roles below are live today.
11 live Python Developers roles · updated daily
Software Engineer (Human Data Platforms)
$50–$70 per hour for a full-stack software engineer to build data infrastructure on micro1's platform. You'll ship APIs, UIs, and data pipelines supporting model evaluation and secure document workflows. The role demands Python/FastAPI and React expertise, with experience across cloud platforms, SQL, and async systems. Requires 6–8 hours daily PST overlap. Suits senior builders comfortable owning problems end-to-end in small teams.
AI Evaluation Engineer (Python)
Turing seeks Python developers with 3–5 years' experience to design AI evaluation tasks measuring advanced model performance on real-world software engineering challenges. You'll author structured tasks, reference solutions, and verification criteria that directly shape the benchmarking of next-generation AI systems. The role suits detail-oriented engineers comfortable translating technical requirements into clear, unambiguous specifications for AI evaluation.
Senior Software Engineer – Python (LLM Evaluation & Repository Validation)
Turing seeks experienced software engineers to evaluate how large language models perform on real coding tasks. Based on public repositories and GitHub issues, you'll set up environments, triage problems, assess test coverage and debug code to inform LLM training datasets. The role suits engineers with 3+ years' experience, Python proficiency, and familiarity with Git and Docker who can work flexibly across distributed teams.
Python + Full-Stack (JS) Developer
Turing seeks Python and full-stack JavaScript developers to build AI training solutions for US-based companies. The contractor role involves designing code for AI model optimisation, conducting model evaluations, creating datasets for supervised fine-tuning, and collaborating on RLHF processes. Minimum 20 hours weekly with 4-hour PST overlap required. Bachelor's degree in engineering or computer science (or equivalent) and Docker proficiency mandatory.
Senior Python Developer
Turing seeks experienced Python developers to support foundational LLM companies in advancing their models. You'll generate high-quality training data, conduct model evaluations, and design SFT/RLHF datasets. The work involves writing production-grade Python code, benchmarking AI outputs, and collaborating with researchers. Minimum 3 years' Python experience required, plus strong testing and debugging expertise. Fully remote contractor roles with flexible 20–40 hour weekly commitments.
Python Developer
Earning $50–$100 per hour, this remote contract role on micro1 invites experienced backend developers to evaluate AI coding tools by testing models in real-world workflows. You'll design and maintain REST and GraphQL APIs, optimise databases, and provide detailed feedback through incident reports and surveys during intensive testing cycles. Requires 3+ years' Python experience and familiarity with Cursor.
Python Developer
Earning $50–$100 per hour, this remote contract role on micro1 invites experienced backend developers to evaluate AI coding tools by testing models in real-world workflows. You'll design and maintain REST and GraphQL APIs, optimise databases, and provide detailed feedback through incident reports and surveys during intensive testing cycles. Requires 3+ years' Python experience and familiarity with Cursor.
Senior Software Engineer Expert
Mercor is seeking Senior Software Engineers at $80–$130 per hour for urgent closed-source application development. You'll design and maintain full-stack systems, build APIs and MCP-based infrastructure, and collaborate with engineering teams. This suits experienced full-stack developers proficient in Python who can work independently, write production-ready code, and contribute to technical architecture discussions.
Software Engineer Expert
Mercor is recruiting Software Engineers at $40–$50/hour to develop MCP servers and integrate applications into its RL Studio platform. You'll build backend systems using Python and FastMCP, manage Docker and Linux environments, and ensure apps meet production standards. The role suits engineers with solid Python skills, API experience, and familiarity with containerisation and debugging workflows.
Junior Python Game Developer (Panda3D)
Earning $50–$120 per hour, this remote contractor role suits junior Python developers with hands-on experience in Panda3D game engine development. You'll analyse game code, engine implementations, and development workflows, providing detailed feedback and domain expertise to train AI systems. The work involves code review, technical documentation, and iterative collaboration with a distributed team on micro1.
Python Developer
Earning $50–$100 per hour, this remote contract role on micro1 invites experienced backend developers to evaluate AI coding tools by testing models in real-world workflows. You'll design and maintain REST and GraphQL APIs, optimise databases, and provide detailed feedback through incident reports and surveys during intensive testing cycles. Requires 3+ years' Python experience and familiarity with Cursor.
No live roles match your search.
AI training work is organised by profession, task and software — not by topic or sector. Try your field (for example “nursing” or “Python”), clear the filters, or browse the categories further down the page. The always-open talent pools below are a good place to start.
What the work looks like
Reviewing the model's work
Read what an AI produced in your field and judge whether the reasoning holds — mark where it went wrong.
Setting hard problems
Write the realistic, demanding tasks that separate competent work from confident-but-wrong.
Judging AI answers
Compare two AI outputs and say which is stronger, and why — your written reasoning is what the model learns from.
Common questions
How much does it pay?
Hourly and contractor-based, varying with seniority and role. Every role card shows its pay band. You invoice as an independent contractor and choose your hours.
Can I do this alongside my current job?
Yes — the work is flexible and part-time by design. Check your employer's policy on outside work first; ACJ can't advise on that.
Who is ACJ, and what's your part in this?
Applied Clinical Judgement is run by Sean Key. We connect qualified people to vetted AI-training platforms (Mercor, micro1, Turing), and Sean personally vouches for the people he refers. We're paid a referral fee by the platform on a successful placement — never by you.
How do I get started?
Find a role below that fits, and apply through the link — it carries Sean's referral. If you'd like him to vouch for you or talk it through first, book a short call.
Sean Key vouches for the people he refers
I’m Sean Key, editor of Applied Clinical Judgement. After 29 years in the NHS I help qualified professionals find legitimate, well-paid AI-training work — and I’ll personally vouch for you when you apply.
Applied Clinical Judgement is a referral intermediary, not an employer or recruiter. We refer candidates to third-party platforms (Mercor, micro1, Turing) and may earn a referral fee on a successful placement. We never charge candidates. Pay rates are set by the platforms and may change. PRAG-DEL-SOL-ONE LTD · Co. 07204925 · VAT 987-3626-64 · ICO ZC086000.
