Development

AI/ML Engineer (LLM & Agents)

Remote, PakistanFull-time

The Role

BearPlex builds AI systems that have to survive contact with regulated enterprises: agents that act, retrieval systems with audit trails, and models that have been measured rather than assumed. This role is for an engineer who treats large language models as components in a real system, not as a magic box.

You will take LLM-powered features from a working idea to a production deployment that is observable, evaluated, and safe to put in front of clients who demand proof. The difference between a clever demo and a system an enterprise can trust is exactly the work you will own.

What You Will Do

Engineer LLM applications. Build features on frontier models including Claude, designing prompts, tool definitions, and orchestration that hold up under real workloads.
Build autonomous agents. Design agent loops with tool use, memory, and guardrails that fail safely and stay within defined boundaries.
Ship RAG and knowledge systems. Build retrieval pipelines with chunking, embeddings, reranking, and audit trails so answers can be traced to their sources.
Make evaluation the backbone. Build eval sets and offline and online measurement so model changes are judged on evidence, not vibes.
Fine-tune open-weight models. Adapt models from Hugging Face and the open-weight ecosystem, and run RLHF and alignment work where the task demands it.
Run models in production. Own latency, cost, caching, fallbacks, and monitoring for LLM features serving live traffic.
Support sovereign and on-prem AI. Help clients run models in their own environments when data residency and control are non-negotiable.

What We Are Looking For

Shipped LLM features in production. You have put model-powered systems in front of real users and kept them running, not just built notebooks.
Strong Python. You write clean, maintainable Python and are comfortable in the ML and data tooling ecosystem.
Agent and RAG fluency. You understand tool use, retrieval, context management, and the failure modes of each.
An evaluation mindset. You instinctively ask how a change will be measured before you make it.
Open-weight model experience. Hands-on work with Hugging Face, fine-tuning, and serving open models.
Production discipline. You care about cost, latency, observability, and auditability as first-class concerns.

Nice to Have

RLHF and alignment. Direct experience with preference data, reward modeling, or alignment techniques.
Vector and data infrastructure. Familiarity with vector stores, PostgreSQL, and embedding pipelines at scale.
Cloud and serving stack. Comfort with AWS, GCP, Docker, and Kubernetes for model deployment.
Application engineering. Ability to wire models into real products with Node.js or TypeScript services.

Why BearPlex

Senior peers. Work with engineers who treat AI as a discipline with measurement and proof, not a trend to chase.
Real production AI. Build agents and knowledge systems that regulated enterprises actually run, with audit trails and accountability.
Fully remote in Pakistan. Work from anywhere in Pakistan with a team that communicates clearly across distance.
Learning budget. Support for research, courses, and the compute and tools you need to stay at the frontier.
Clear growth. A path into deeper model engineering and AI architecture as our AI practice scales.

What you bring

PythonLLMsClaude APIRAGAutonomous AgentsHugging FaceFine-tuningModel EvaluationRLHFKubernetes

Apply

Send us your work.

Start by dropping your resume. We read it and fill in the form for you, so you only complete what we could not. Our team reviews every application personally; you will hear back either way.

Drop your resume to begin, or click to browse

PDF, up to 10MB. We will read it and fill in the form for you.

Prefer to type it out?