Can we use Llama 4 in a commercial product?

Yes, for almost everyone. The Llama 4 Community License grants royalty-free commercial use unless your products exceeded 700 million monthly active users on the Llama 4 release date, a clause aimed at Meta's direct competitors. The obligations that actually affect normal companies are the attribution requirement (prominently displaying 'Built with Llama'), shipping the license and notice file with any redistribution, and the naming rule for distributed fine-tuned derivatives.

Do we really have to display 'Built with Llama' in our product?

If you distribute a product or service containing Llama Materials, yes: the license requires prominently displaying 'Built with Llama' on a related website, user interface, blog post, about page, or product documentation. You have flexibility on placement (documentation counts), but not on existence. For internal-only tools that are never distributed, the requirement has no practical bite. For white-label products, resolve this with your client before committing the architecture.

Can we fine-tune Llama 4 and release the model under our own name?

Not under a name of your choosing. The license states that if you use Llama Materials or their outputs to create and distribute an improved AI model, you must include 'Llama' at the beginning of the model name. Internal fine-tunes you never distribute are unaffected. If distributing a branded proprietary model is core to your product, this clause is usually the reason evaluations shift to Apache 2.0 (Qwen 3) or MIT (DeepSeek R1) alternatives.

Is Llama 4 open source?

No, not in the OSI sense, and Meta's own license title says 'Community License' rather than claiming otherwise. The weights are downloadable and free for most commercial use, but the license carries field-of-use conditions (Acceptable Use Policy), attribution obligations, naming requirements on derivatives, and a user-threshold gate. 'Open weights under a conditional license' is the accurate description, and the difference matters mostly when you distribute models or ship white-label products.

What hardware does Llama 4 actually need?

Per Meta's published claims: Scout (109B total, 17B active) fits a single H100 with Int4 quantization, and Maverick's FP8 weights (400B total) fit a single 8x H100 DGX host. Working from parameter counts, Scout at BF16 is roughly 218GB of weights and Maverick at FP8 roughly 400GB, before KV cache, which grows with the context you actually use. Because only 17B parameters are active per token, throughput per GPU-hour is strong once memory is provisioned; the cost cliff is provisioning, not serving.

Is the 10 million token context window real?

It is Meta's stated architectural ceiling for Scout, and it is genuinely one of the largest published context lengths for open weights. Whether quality holds for your use case at extreme lengths is a separate empirical question, and KV-cache memory and latency scale with the tokens you actually load regardless of what the ceiling allows. Our advice is unchanged from every long-context model we evaluate: design for the context you validated on your own documents, not the number on the launch slide.

Does BearPlex deploy Llama 4 in client work?

We evaluate it as a standard candidate wherever multimodal document intake meets data-residency requirements, the pattern common across our compliance-heavy platform work. Two things are always in the engagement: a license-compliance checklist (attribution placement, notice files, derivative naming) treated as a deliverable, and a model-swappable interface so the client is never architecturally locked to the license terms if their product strategy changes.

Llama 4: License Terms and the Engineering Decision

Every open-weights model brief has a license section. This one *is* the license section, with a model attached. Llama 4 is a genuinely capable multimodal family, but in our client evaluations the technical comparison is rarely what decides the engagement. The Llama 4 Community License is. It is short, readable, and full of obligations that product teams routinely discover after the architecture is committed. Read it first; the model specs will still be there afterward.

What it actually is

Llama 4 (Meta, April 5, 2025) is Meta's first natively multimodal, Mixture-of-Experts open-weights generation. Text and image tokens are fused early into one backbone rather than bolted together with an adapter. Two models shipped with downloadable weights:

Llama 4 Scout: 17B active parameters, 16 experts, 109B total, with a claimed 10 million token context window. Meta states it fits on a single H100 with Int4 quantization.
Llama 4 Maverick: 17B active parameters, 128 experts, 400B total, with a 1M token context window per the model card, released in BF16 and FP8 (Meta: FP8 "fits on a single H100 DGX host").

Both are distilled from Llama 4 Behemoth (288B active, roughly 2T total), which was still training at announcement time. The instruct models cover 12 languages, with a knowledge cutoff of August 2024.

The MoE math is the part to internalize: per-token compute resembles a 17B dense model while memory requirements resemble the total parameter count. Llama 4 is therefore cheap to *run* per token and expensive to *hold* in memory, the exact inverse of what most capacity planning assumes.

The license, and what commercial use really permits

The Llama 4 Community License (effective April 5, 2025) grants a royalty-free, worldwide license to use, modify, and redistribute. It is not open source in the OSI sense, and four clauses matter for client products:

1. The 700M-MAU gate. Quoting the license: "If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta." For nearly every company on earth this clause is irrelevant; it exists to exclude Meta's direct rivals. But note the measurement date: your MAU *on the Llama 4 release date*. Crossing 700M later does not retroactively strip the license.

2. "Built with Llama" is mandatory and visible. If you distribute Llama Materials, or a product or service (including another AI model) that contains them, you must "prominently display 'Built with Llama' on a related website, user interface, blogpost, about page, or product documentation." For an internal tool nobody outside the company sees, this is trivial. For a white-label product your client resells under their own brand, it is a real conversation: the attribution requirement travels with the model, and "prominently" is Meta's word, not ours.

3. Derivative models must carry the name. If you use Llama 4, or its outputs, to build a fine-tuned or improved model that you distribute, the license requires you to "include Llama at the beginning of any such AI model name." Your fine-tune cannot ship as "AcmeMed-8B"; it ships as "Llama-AcmeMed" or it stays in-house. Purely internal models you never distribute are not caught by this.

4. Redistribution paperwork. Distributing the weights or a derivative means shipping a copy of the license agreement and the notice file text: "Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved." Usage must also comply with the Acceptable Use Policy.

The practical read for client work: for internal deployments, the license costs you a line in the docs. For customer-facing products, budget a short legal review and a product decision about the attribution badge. For products whose whole value is a proprietary fine-tuned model you distribute to customers, the naming clause can be disqualifying on brand grounds alone, and Apache 2.0 alternatives or MIT alternatives start winning the evaluation before benchmarks are even discussed.

Real deployment cost

Meta's own hardware claims set the floor: Scout on a single H100 at Int4; Maverick FP8 on a single 8x H100 DGX host. Working from the published parameter counts (weights only, before KV cache):

Scout at BF16: 109B parameters is roughly 218GB, a multi-GPU node.
Scout at Int4: roughly 55GB, which is how Meta's single-H100 claim works. Quantization to 4-bit is a quality tradeoff you eval, not a free lunch.
Maverick at FP8: roughly 400GB, hence the 8x H100 host sizing.

Two cost notes from practice. First, the 17B-active MoE design means throughput per GPU-hour is strong once the memory is provisioned; the cost cliff is the provisioning, not the serving. Second, that giant context window is not free: KV cache grows with tokens actually in context, so "10M context" workloads carry memory and latency costs that have nothing to do with the weights. Treat the 10M figure as Meta's stated architectural ceiling, and eval long-context quality on your own documents before designing a product around it.

Latency and eval behavior that matters

Interactive-grade latency. Unlike reasoning models, Llama 4 does not spend thousands of thinking tokens before answering. For chat, extraction, and summarization the latency profile is that of a 17B dense model.
Multimodality is native, not bolted on. Early fusion means image-plus-text prompts (screenshots, scanned forms, photos with captions) go through one model, simplifying pipelines that previously chained a vision encoder into a text LLM.
Language coverage is thinner than rivals. Twelve supported languages, versus 119 claimed by Qwen 3. For multilingual products, check your language list first.
Benchmark claims deserve skepticism. We are deliberately quoting no launch benchmark numbers in this brief. Vendor comparisons, every vendor's, are marketing until reproduced on your own tasks; run task-level evals before committing.

When to use it, and when not

Use Llama 4 when:

The workload is multimodal document intake (forms, IDs, invoices, screenshots) and you want one open model, self-hosted, instead of an OCR-plus-LLM chain.
You need long-context review over large document sets inside your own VPC and have validated quality at your actual context lengths.
Attribution is a non-issue (internal tools) and the Meta ecosystem tooling matters to your team.

Do not use it when:

Your product distributes a fine-tuned model under your own brand; the naming clause forces "Llama" into the name.
White-label constraints make "Built with Llama" undisplayable.
You need small edge deployments; the family has no small dense models, which is exactly where the Qwen 3 ladder is strong.

How we would architect it for a client

The fit we see most often is document-heavy regulated platforms, the same shape as the NDIS provider-management platform we build as a long-term development partner (Vertex360): high volumes of scanned forms, compliance documents, and mixed-media evidence that cannot leave the client's environment.

The pattern:

Scout as the multimodal intake layer in the client's sovereign cloud: single-GPU Int4 deployment handling classification, extraction, and structured summarization of image-plus-text documents, with outputs validated against schemas in code.
License compliance as a deliverable, not an afterthought. The engagement checklist includes the attribution placement decision, the notice file in every distributed artifact, and a naming review for any fine-tuned derivative before it gets a product name.
A fallback lane to a permissively licensed model. Because the license terms are product-shaping, we architect the model interface so Llama 4 is swappable; if the client's product strategy later collides with the attribution or naming clauses, the migration is a config change, not a rebuild. This is standard practice in our model engineering work regardless of vendor.

Llama 4 is a good model wrapped in a license that is fine for most and fatal for some. Which one you are is a twenty-minute legal read. Do it first.

Llama 4Open-Weights LLM

What it actually is

The license, and what commercial use really permits

Real deployment cost

Latency and eval behavior that matters

When to use it, and when not

How we would architect it for a client

Frequently asked

Related work

Related reading

Shipping open-weights llm in production?