Multilingual AI Response Evaluation Project

last updated September 16, 2025 12:26 UTC

HQ: On-site

OFF: Any
Full-Time
All Other Remote

Join Project Spearmint, a multilingual AI evaluation initiative focused on reviewing large language model (LLM) outputs in various languages, with an emphasis on either Tone or Fluency. To participate, you must have native-level fluency in a target language and strong English comprehension.

As an evaluator, you will analyze short, pre-divided datasets and assess AI-generated responses based on defined quality criteria. Your feedback will support the validation of evaluation methods and help set baseline quality standards for future model improvements.

Key Responsibilities:

– Evaluate AI responses in your native language, focusing on either Tone or Fluency.
– Judge the overall quality, accuracy, and naturalness of the replies.
– Read a user prompt and two model-generated responses, then rate each on a five-point scale.
– Provide short explanations for any extreme ratings.

Project Details:

Batch 1 – Tone: Evaluate whether responses are helpful, engaging, fair, and insightful. Identify issues like inappropriate formality, condescension, bias, or other tonal concerns.

Batch 2 – Fluency: Review responses for grammar, clarity, coherence, and natural flow.

This is a project-based role with CrowdGen, where you’ll join as an Independent Contractor. If selected, you’ll receive an email from CrowdGen to create an account using your application email. You’ll need to log in, reset your password, complete the setup steps, and continue with your application.

Help shape the future of AI—apply now and contribute from home.

Pay: $2 – $20.22 per hour.

Apply info ->

To apply for this job, please visit jobs.lever.co