Self-hosted AI means running a language model on hardware you control instead of sending your data to a third party's API. For a biotech sitting on unpublished results, patient records, and patents in progress, keeping the model in house sounds obvious. It is not always the right one, and a commercial API under the right contract often protects the same data for far less cost and effort.
We build custom software for biotech and life sciences labs, including systems that put AI on internal data, so we have made this call with teams on both sides of it. Fewer teams need to self-host than assume they do, and the ones that do need to know it early, before they have wired a confidential pipeline through a service they cannot audit.
What you are protecting and from whom
"Is AI safe to use" is the wrong question. The useful one is what specifically leaves your environment and who can read it. A biotech holds a few distinct kinds of sensitive data, and they do not carry the same risk:
- Intellectual property like assay designs, screening results, and sequences is the company's value, so a leak into a public model or a logged prompt is a real loss.
- Patient data carries legal weight on top of the commercial kind, because a workflow that creates, receives, or transmits protected health information for a covered entity or business associate brings HIPAA obligations with it.
- Unpublished results carry less risk of theft than of a competitor or a reviewer seeing the work before you have staked your claim.
- Data that belongs to someone else, covered by an NDA with a pharma partner or a CRO client, comes with rules about where it can and cannot go.
Map which of these a given workflow touches before you choose any tool. A model that summarizes public literature carries almost no risk. A model that reads patient records or reasons over your screening data is where self-hosting earns its place.
Self-hosted AI versus a commercial API
Self-hosting is not automatically more private. A frontier model called through a commercial API under a signed agreement can be more defensible than an open model running on a server that goes unpatched. The major providers separated their consumer products from their commercial APIs years ago, and their eligible commercial and enterprise tiers do not train on the data you send, so a confidential prompt your scientist sends through such a tier is not feeding a future public model. Anthropic, for example, states in its commercial terms that it does not train its models on your inputs and outputs, and it offers a HIPAA-ready tier for teams handling protected health information. Confirm the current terms, and whether a business associate agreement, the contract a vendor signs to handle that data on your behalf, is available, before sending anything sensitive.
What you give up with the API path is control and auditability, not baseline security. You are trusting a contract and a provider's controls instead of a network boundary you own. For teams that can accept that, the API route gives you the strongest models, no infrastructure to run, and a bill that scales with use. For teams that cannot, because of a regulator, a partner contract, or a board that will not approve any data leaving the building, self-hosting an open model is the path.
| Dimension | Commercial API under agreement | Self-hosted open model |
|---|---|---|
| Model quality | Frontier, updated by the provider | Trails the frontier, you manage upgrades |
| Where data goes | To the provider under contract, not used for training on eligible tiers | Stays inside your environment |
| Compliance | BAA and audited controls available | You own every control and the evidence |
| Cost shape | Per token, scales with use | Fixed GPU and staff cost, paid whether the model is busy or idle |
| Effort to run | Almost none | Deployment, monitoring, and patching are on you |
Putting AI to work on your own data
There are two ways to make a model useful on your own data, and the choice affects both privacy and effort.
The first is retrieval, often called RAG. You keep your documents in a searchable store, and when someone asks a question the system finds the relevant passages and hands them to the model to read before it answers. The model never keeps the data, you can add or remove a document and a well-built setup picks up the change on its next sync, and the system can be designed to show the source behind each answer, which matters when a scientist needs to trust it. For almost every biotech use case this is the better place to start.
The second is fine-tuning. You retrain the model on your own data so the information is built into it, which is harder to update, harder to audit, and not as private as it sounds, because language models have been shown to memorize and regurgitate verbatim pieces of their training data. It is worth the trouble for narrow jobs, like teaching the model a fixed output format or sorting documents into known categories, but not for giving it general knowledge of your research. Start with retrieval and only fine-tune if it falls short. The instinct is the same as keeping ownership of your data instead of handing it to a vendor, which we covered in our piece on lab software data lock-in.
Where the data lives and why residency matters
When the data is regulated, its physical location is part of what an auditor checks. Under HIPAA, a service that handles protected health information for you needs a business associate agreement and appropriate safeguards, and knowing where the data is stored and processed falls out of the risk analysis, contracts, and vendor due diligence behind that, which our guide to HIPAA compliance for lab software covers in more depth. If an AI system creates or touches electronic records that an FDA predicate rule requires, or that you submit to the FDA, those records fall under 21 CFR Part 11, which brings audit trails, access controls, and validation to that workflow the same as to any other in-scope system.
This is where a sloppy self-hosted setup loses to a well-chosen API. Residency and an audit trail are things you design for, and a managed provider that offers a BAA, regional hosting, and access logs may hand you more of that than a GPU server stood up in a hurry. Either path can work, as long as you decide data location and audit evidence on purpose.
What it costs to stand up and run
The cost gap is wider than the privacy gap. The API path is billed per token, and the rate depends on the model and the call: frontier models run a few dollars per million tokens of input and several times that for output, while smaller models, cached input, and batched jobs cost far less. There is no infrastructure behind it, so a team can run real workloads for a modest monthly bill and stop paying when they stop using it. Self-hosting inverts that. An always-on cloud GPU instance large enough to serve a mid-sized open model, billed by the hour and kept warm so it answers without a cold start, lands in the low thousands of dollars a month and climbs fast for larger models or higher throughput, paid whether the model is busy or idle.
The larger cost is people. Someone has to deploy the model, keep it monitored, patch it, manage upgrades, and answer for it when it breaks, and that is an engineer's time, ongoing, which for a small biotech competes directly with the work that moves the science. Self-hosting is worth it when a rule or a contract requires it, or when your volume is high enough that fixed infrastructure beats per-token pricing. It is overkill when the real driver is a vague sense that in house feels safer.
A decision path for biotech teams
Start with the API path under a no-training agreement, and a BAA where patient data is involved, and move off it only when something specific pushes you. That order keeps the strongest models and the lowest operational burden, and treats self-hosting as the exception it usually is.
Self-host when one of these holds:
- A regulator or a partner contract states the data cannot leave your environment, full stop, even under an agreement.
- You handle data so sensitive that no third-party contract is an acceptable risk to the business.
- You already run the engineering function to deploy and maintain models, so the operational cost is one you can carry.
- Your usage is high and steady enough that fixed GPU cost comes out below per-token pricing.
If none of those is true, the API path under a proper agreement is the faster, cheaper, and often more compliant choice. For the model itself, default to the most capable model that fits, which for an API path today means a frontier model such as Claude, and reserve a self-hosted open model for the cases where the data simply cannot move. Our overview of the AI models landscape for life sciences goes deeper on choosing between them.
Conclusion
Wanting to keep AI close to home is reasonable. What you need is for confidential data to stay confidential and to prove where it lives, and a commercial API under a signed agreement clears that bar for many biotech workflows at a fraction of the cost and effort. Map which data each workflow touches, default to the API path with the right contract, prefer retrieval over fine-tuning, and self-host only where a rule, a contract, or your scale makes it the better call.
Putting AI on your own research data? We design these systems for biotech and life sciences teams, from the data boundary to the model choice. about an architecture review before you wire confidential data through anything.
Frequently asked questions
What is self-hosted AI?
Self-hosted AI is a language model run on infrastructure you control, such as your own servers or a private cloud environment, instead of being accessed through a third party's API. The data and the model both stay inside a boundary you own, which some regulated or contractual situations require. It costs more to run and trails the frontier models in capability, so it is worth it only when the data cannot leave your environment at all.
Is a cloud AI API safe for confidential biotech data?
It can be, under the right agreement. Eligible commercial tiers from major providers do not train on the data sent through them, and several offer a HIPAA-ready option with a business associate agreement, so a confidential prompt is covered by contract and is not feeding a public model. The trade-off is that you are trusting a provider's controls instead of a boundary you own, which is fine for many workflows and unacceptable for a few. Check the provider's data handling terms and confirm a business associate agreement is available before sending protected health information.
Should we use RAG or fine-tuning for our own data?
For giving a model knowledge of your research, retrieval augmented generation is almost always the better choice. It keeps your documents in a store you control, picks up changes when the store is re-synced, and can be built to cite the source behind each answer. Fine-tuning bakes information into the model, which is harder to update and audit, and it earns its place mainly for matching a format or a classification task with many labeled examples rather than for teaching a model your data.
Does HIPAA require self-hosting AI?
No. HIPAA does not require self-hosting. It requires a business associate agreement and appropriate safeguards with a service that handles protected health information for you, and your own risk analysis and vendor due diligence cover where the data is stored and processed. A cloud provider that signs a business associate agreement and offers regional hosting and access logs can meet that, often with less effort than a self-hosted setup you have to secure and document yourself.
How much does it cost to run a self-hosted AI model?
Plan for a few thousand dollars a month at the low end for a cloud GPU instance able to serve a capable open model, rising quickly with model size and throughput, and you pay it whether the model is busy or idle. On top of that is engineering time to deploy, monitor, patch, and maintain it, which is the larger long-term cost. An API path by contrast is billed per token, a few dollars per million for input and more for output, with no infrastructure to run, which is why it wins until scale or a hard requirement changes the math.
Last updated: June 26, 2026














