Skip to main content
STACK REVIEWS

The tools we use, rated honestly.

LLM frameworks, vector databases, and agent platforms, reviewed by the engineers who deploy them in production. We name what works and what we'd avoid.

3.5/5 · 11+ projects

LangChain review

LangChain remains the most widely-known LLM framework but is no longer our default for new production projects. We use LangGraph (the same company's stateful graph library) for production agent systems and reach for LangChain mostly for legacy maintenance, prototyping, or specific integrations. The framework's breadth is both its strength and its weakness: there's an abstraction for everything, but production reliability often requires bypassing the abstractions.

Open
4/5 · 9+ projects

Pinecone review

Pinecone is the lowest-friction managed vector database we've shipped to production, and it remains our default for clients who want a vector layer that just works without ops overhead. Performance is consistently strong at the millions-of-vectors scale, and the serverless tier removed the previous pricing pain at low volume. The trade-off is real lock-in (no self-hosted option) and a price ceiling that arrives faster than self-hosted Qdrant or pgvector for high-volume workloads: for the right client, that trade-off is worth it.

Open
4.5/5 · 8+ projects

LangGraph review

LangGraph is our default choice for production agent systems and has been since mid-2024. It does what LangChain's original AgentExecutor never quite delivered: explicit state management, human-in-the-loop checkpoints, and the kind of debugging visibility that production agents need. The learning curve is steeper than chain-based abstractions, but the payoff is real: we've shipped agents on LangGraph that would have been much harder to build with alternative frameworks.

Open
4.5/5 · 6+ projects

Claude Agent SDK review

Claude Agent SDK is our default choice for production agent systems committed to the Anthropic platform. It's purpose-built for Claude's tool use behavior, supports parallel tool calls and human-in-the-loop natively, and the resulting agents tend to be cleaner than equivalent provider-agnostic implementations. The trade-off is real provider lock-in: you're committing to Claude. For Claude-only production work, this is often the right trade-off; for multi-provider production systems, LangGraph is the better choice.

Open
4.5/5 · 7+ projects

Qdrant review

Qdrant is our top choice for self-hosted production vector databases and a strong managed alternative to Pinecone. The open-source core is feature-complete and production-ready, the operational ergonomics are better than alternative open-source vector DBs we've worked with (Milvus, Weaviate), and the managed Qdrant Cloud offering is competitive with Pinecone on cost and quality. For clients who need self-hosted deployment or want to avoid Pinecone's vendor lock-in, Qdrant is our default recommendation.

Open
4/5 · 4+ projects

Weaviate review

Weaviate is a strong open-source vector database with a unique combination of built-in vectorization modules, GraphQL API, and growing AI-native features. We've shipped several production deployments on Weaviate and consider it competitive with Qdrant for self-hosted production work. The built-in vectorization (auto-embed via OpenAI / Cohere modules) is genuinely useful when it matches your needs. Where Weaviate falls slightly behind Qdrant in our production benchmarks: raw performance at large scale and operational ergonomics. Where it wins: integrated AI features, GraphQL UX, and a stronger story for AI-native applications that go beyond simple vector retrieval.

Open
4/5 · 12+ projects

LlamaIndex review

LlamaIndex is the strongest RAG-focused framework we've worked with and our default choice for document-heavy retrieval engagements. Where LangChain spreads across many LLM application patterns, LlamaIndex goes deep on the document ingestion / indexing / retrieval pipeline, and the depth shows. We use LlamaIndex for production RAG over diverse document types (PDFs, HTML, Office documents, code) where the document parsing and indexing complexity matters. For pure agent work or non-RAG use cases, LangGraph or other frameworks are usually better choices. LlamaIndex earns its place in our stack as the document-RAG specialist.

Open
4.5/5 · 8+ projects

Vercel AI SDK review

Vercel AI SDK is our default choice for TypeScript-first LLM applications, especially front-end-heavy work. It's the cleanest API for streaming UX, multi-provider abstraction, and modern React patterns. We've shipped many production applications on it. Where it falls short: complex agent systems benefit from LangGraph (Python) and complex RAG benefits from LlamaIndex (Python); Vercel AI SDK is great for the typical TypeScript application layer but isn't trying to compete with Python-ecosystem heavyweights for those specific deep capabilities.

Open
4/5 · 14+ projects

pgvector review

pgvector is our default vector database recommendation for teams already running Postgres at production scale. The simplicity of one database (no separate vector store to operate, transactional consistency between vectors and metadata, easy backup) wins for most workloads under 5-10M vectors. Above that scale, dedicated vector databases (Qdrant, Pinecone) start to win on latency and metadata filtering performance. For the very common pattern of 'add semantic search to an existing Postgres-backed application,' pgvector is an underrated choice that often beats more sophisticated alternatives on total cost of ownership.

Open
4/5 · 9+ projects

MLflow review

MLflow is our default choice for production ML lifecycle management: model registry, deployment, lineage tracking, experiment tracking. Open-source (Apache 2.0), enterprise-friendly, integrates with Databricks but works with any ML stack. Where it shines: production model registry with version tracking and deployment integration. Where it falls short vs Weights & Biases: experiment tracking UX is less polished, collaboration features are less developed. For teams prioritizing production ops over experimentation polish, MLflow is the right answer; for research-heavy teams, W&B often wins.

Open
4.5/5 · 18+ projects

Cohere review

Cohere is our default choice for production reranking and a strong choice for multilingual embeddings. Cohere Rerank is best-in-class for second-stage retrieval scoring; Cohere Embed v3 is excellent for multilingual workloads. Cohere's Command LLMs are competitive but typically not first choice over GPT/Claude/Gemini. Where Cohere wins: rerank and multilingual embeddings. Where it falls short: Command models don't beat frontier alternatives on general tasks. We use Cohere extensively for the rerank component of production RAG pipelines and for multilingual retrieval.

Open
4/5 · 7+ projects

Mistral review

Mistral is a strong choice for open-source LLM deployment and has an underrated managed API. Mistral 7B and Mixtral 8x7B / 8x22B are widely-used open-source models in production. The managed Mistral API is competitive with GPT-4o-mini for cost-sensitive workloads. Where Mistral excels: open-source deployment, European customers wanting an EU-headquartered AI provider, cost-optimized production. Where it falls short: frontier quality (GPT-5 / Claude Opus / Gemini 2.5 Pro typically win on top-tier tasks). For self-hosted production or cost-sensitive workloads, Mistral is a strong default; for frontier-quality work, choose American frontier providers.

Open
4.5/5 · 11+ projects

Modal review

Modal is our default choice for serverless GPU compute and AI workloads that don't fit cleanly into standard cloud patterns. The Python-native developer experience is best-in-class; the serverless GPU pricing is excellent for sporadic workloads; the operational simplicity is dramatic. Where Modal wins: ML / AI workloads, batch processing on GPUs, fine-tuning jobs, custom inference deployments, anything Python-heavy. Where it falls short: not a replacement for full general-purpose cloud (use AWS / GCP / Azure for typical web infrastructure). For ML / AI engineering specifically, Modal is hard to beat.

Open
4/5 · 9+ projects

Together AI review

Together AI is our default choice for managed open-source LLM inference: Llama, Qwen, Mistral, DeepSeek, and others available via API without operating self-hosted infrastructure. Pricing is excellent (typically 3-10× cheaper than equivalent frontier API usage); inference quality matches what you'd get self-hosted; the operational simplicity is dramatic. Where Together AI wins: managed open-source LLM inference at competitive prices. Where it falls short: not as polished as frontier APIs (OpenAI, Anthropic) on developer experience; less mature for some advanced features. For teams wanting open-source LLM economics without operating self-hosted infrastructure, Together AI is the right answer.

Open
3.5/5 · 3+ projects

DSPy review

DSPy is an interesting framework that takes a different approach to LLM application development: programming rather than prompting. Define your task as a Python program with declarative LLM modules; let DSPy optimize the prompts and demonstrations automatically. The approach is intellectually elegant; the production track record is still emerging. Where DSPy wins: tasks where prompt engineering is the bottleneck and you want to automate prompt optimization. Where it falls short: less production-tested than LangChain / LlamaIndex; smaller ecosystem; learning curve is non-trivial. For research-heavy work or specific use cases where prompt optimization matters, DSPy is worth considering. For typical production AI work, more established frameworks usually win.

Open