Building the Next Generation of Distributed Systems: A deep Dive into Our Engineering Ideology
We’re building something aspiring – a platform to fundamentally change how high-performance computing (HPC) and demanding workloads are orchestrated and scaled. This isn’t about incremental improvements; it’s about tackling challenges that haven’t been solved before. If you’re a systems engineer who thrives on complexity and wants to build the future, you’ve come to the right place.this article details the kind of expertise we value and the challenges you’ll face alongside us.
What We’re Looking For: The Core Traits
We don’t just need skilled coders; we need problem solvers. here’s what defines success on our team:
A Creative Problem-solver: You’ve stared down difficult technical hurdles – perhaps in compiler design, distributed systems, embedded environments, or building highly available platforms – and emerged victorious.
A Proven Collaborator: You excel at working with brilliant engineers, contributing to a shared vision, and achieving ambitious goals together.We believe the best solutions are born from collaborative effort.
Intellectually Fearless: You aren’t intimidated by the unknown. In fact,the prospect of building something entirely new is what drives you.
The Technical Foundation: Required Expertise
This role demands a deep understanding of systems-level programming and distributed systems principles. We’re looking for engineers who can reason from first principles and translate theory into robust, scalable solutions.
1. Concurrency & Distributed systems Mastery:
You’ll need a strong theoretical and practical grasp of the challenges inherent in distributed systems. This includes:
Concurrency control mechanisms.
Multi-threading and pre-emption.
Resource contention and its impact. A deep understanding of race conditions, deadlocks, and consistency models.2. Systems Programming Prowess:
We rely heavily on a specific tech stack, and expert-level proficiency is crucial:
C: Essential for kernel-level work and low-level optimization.
Go or Rust: For building high-performance, concurrent services. We need more than just users of these languages; we need engineers who understand their memory models and concurrency primitives.
Python: For integration with existing orchestration frameworks and tooling.
3.Linux & Container Internals – The Building Blocks:
A solid foundation in Linux/UNIX is non-negotiable. You should be cozy with:
System libraries and services.
Networking fundamentals.
Kernel/user-space interaction.
Containerization technologies like containerd/cri-o, runc, cgroups, namespaces, and seccomp.
4. Orchestrator Internals: Beyond the Basics
We need someone who understands the why behind orchestration,not just the how. Specifically:
Fairshare principles.
Multifactor priority scheduling.
Fairshare decay mechanisms.
Quality of Service (QOS) management.
5. HPC & GPU Workload Expertise:
Experience deploying and managing GPU workloads under SLURM is highly valuable. You should understand:
Workload isolation techniques.
Accelerator resource accounting.
6. Networking in Kubernetes: Understanding the Flow
You should be able to trace how packets flow within a Kubernetes environment. Experience with tools like CNI, Cilium, and/or Istio is a critically important plus.
7. Production Readiness: Scaling and Reliability
This isn’t a research role.You’ll be responsible for building and maintaining production-level systems. This means:
Hands-on experience scaling infrastructure.
Managing Kubernetes clusters.
Using infrastructure-as-code tools (Helm and Terraform).
A commitment to reliability and a willingness to participate in an on-call rotation (we’re building a sustainable rotation, prioritizing engineer well-being).
Going the extra Mile: Bonus Skills
While the above are requirements, these skills will make you stand out:
Open-Source Contributions: Contributions to projects like Kubernetes, containerd, or the Linux kernel demonstrate a commitment to the community and a deep understanding of these technologies.
Virtualization in Kubernetes: Experience with KubeVirt