Cantina Labs Jobs

Machine Learning Engineer, TTS

Cantina Labs

Machine Learning Engineer, TTS

Reposted 4 Days Ago

Remote

Hiring Remotely in Ireland, IRL

Senior level

Remote

Hiring Remotely in Ireland, IRL

Senior level

Build and productionize large-scale TTS and adjacent speech models end-to-end: model architecture, training, alignment, evaluation, data curation, distributed GPU scaling, latency/cost profiling, tooling, and safety/mitigation for speech systems.

The summary above was generated by AI

About Cantina

Cantina is a new social platform founded by Sean Parker with the most advanced AI character creator. Our bots are lifelike, social creatures that can interact wherever people are online—across voice, video, and text. Create yourself, imagine someone new, or choose from thousands of characters to share infinitely scalable, personalized content and seamless group chat.

If you’re excited about how AI can shape creativity and social interaction, come help us build what’s next.

About the Role:

We’re looking for a Research / ML Engineer to join our Speech Team to build state-of-the-art speech systems end-to-end—from data specs through production inference. You’ll drive the model ↔ data ↔ eval flywheel for TTS and adjacent tasks (voice cloning, controllable TTS, voice conversion and more), partnering closely with research, data, and infra to ship fast, reliable, and cost-aware models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

What You’ll Do:

Model Building: Architect, implement, pre-train, fine-tune, and post-train/alignment (e.g., GRPO/DPO) for large-scale speech models.
Project Leadership: Independently lead small research projects while collaborating on larger team initiatives.
Experimental Design: Design, run, and analyze scientific experiments to advance our understanding of the models.
Tool Development: Develop and improve dev tooling to enhance team productivity.
Full-Stack Contribution: Contribute to the entire stack, from low-level optimizations to high-level model design.
Data Ownership: Define data requirements and collaborate on acquisition, curation, augmentation, labeling quality, and synthetic data strategies.
Rigorous Evaluation: Design automated objective/subjective evaluations—listening tests, SV/WER/ASR-based metrics, robustness & bias checks, and red-team studies.
Pipeline Delivery: Harden the training → evaluation → inference pipeline; profile latency, memory, and cost; and meet production SLAs with robust monitoring and rollback.
GPU Scaling: Partner with infrastructure to run distributed training/inference on cloud fleets and productionize models with reliability and observability.
Safety & Responsibility: Contribute to safety/consent guardrails and to misuse/abuse mitigation for responsible speech technology.

What You’ll Bring:

Exceptional research/development experience with large scale audio models (>3B models and >500k hours data).
Exceptional understanding and hands-on experience with transformer architectures and/or diffusion models (inc. distillation and streaming) and/or audio language modelling.
Strong experience with multi-node and multi-gpu distributed model training.

Strong software engineering skills with a proven track record of building complex systems
Strong with PyTorch and performance work (profiling, CUDA/Triton/C++ as needed) and writing reliable production quality code.
Shipped large scale speech/audio models to production.
Background in working with large-scale ML data.
Ability to iterate on data,, and triangulate quality using subjective and objective signals.
Notable publications and/or open source contributions in speech/audio/ML.
Experience with voice-cloning, speech-control, voice-generation.

Preferred Experience:

Shipped large scale speech/audio models (TTS/VC/ASR) to production.
Work on large-scale ML systems.
Experience with audio language modelling, transformer architectures.
Experience with voice-cloning, speech-control, voice-generation.
Background in processing large-scale ML data.
Publications or notable open-source in speech/audio/ML.

Compensation:

The anticipated annual base salary range for this role is between $200,000-$220,000 (€170,000-€190,000). When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.

Benefits for U.S.-based roles:

Competitive salary and generous company equity
Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
42 days of paid time off, including:
- 15 PTO days
- 10 sick days
- 15 company holidays
- 2 floating holidays
Generous parental leave & fertility support
401(k) retirement savings plan
Lifestyle spending account – $500/month to use however you’d like
Complimentary lunch and snacks for in-office employees
One Medical membership, and more!

Similar Jobs

Rain

Technical Support

2 Days Ago

Remote or Hybrid

Ireland, IRL

Mid level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3 • Infrastructure as a Service (IaaS)

Provide advanced technical support for Rain's payments platform: troubleshoot APIs and integrations, read logs and run SQL queries, own incidents in 24x7 coverage, communicate with partners and internal teams, produce documentation and troubleshooting guides, and feed product and engineering insights to improve systems and roll out features.

Top Skills: Rest ApisScriptingSQL

Deepgram

Research Staff, LLMs

2 Days Ago

In-Office or Remote

Ireland, IRL

Mid level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI

The role involves researching and developing large language models (LLMs) with a focus on transformer architecture, data curation, distributed training, and optimization. Responsibilities include conducting experiments, collaborating with teams, and staying updated on deep learning advancements.

Top Skills: Distributed ComputingLarge Language ModelsPythonPyTorchTransformer Architectures

Deepgram

Account Executive

2 Days Ago

In-Office or Remote

Ireland, IRL

Senior level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI

The Account Executive will drive new customer acquisition and revenue, self-prospecting to build sales pipelines, and collaborating with marketing and sales teams. Responsibilities include understanding customer needs in voice AI, articulating Deepgram's value, and managing existing accounts for upsell opportunities.

Top Skills: AIAPIsMlSpeech-To-SpeechSpeech-To-TextText-To-Speech

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.