Description:
Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scaleDiagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performanceDesign and extend services to improve functionality and reliability of the platformMonitor system performance, optimize for cost and efficiency, and resolve any issues that arise Build relationships with stakeholders across the organization to better understand internal c
Mar 3, 2025;
from:
dice.com