How Much Data Does ChatGPT Store About You? A Technical Audit

2026-04-02 · PebbleFlow Team

If you use ChatGPT at work, your employer probably has questions. If you are the employer, you definitely should.

This is a technical audit of what OpenAI collects, how long they keep it, and what it means for teams in regulated industries. We will also look at how Bring Your Own Key (BYOK) architecture changes the equation entirely.

What ChatGPT Stores: The Full Picture

OpenAI's own help center documentation states that chats are "saved to your account until you delete them manually." When you do delete a chat, it is "removed from your account immediately and scheduled for permanent deletion from OpenAI systems within 30 days" -- unless it has already been de-identified, or OpenAI must retain it for security or legal obligations.

But conversations are only part of the story. OpenAI's privacy policy describes extensive automatic data collection:

  • Log data -- IP address, browser type and settings, date and time of requests, and how you interact with the services
  • Usage data -- types of content you view or engage with, features you use, actions you take, and feedback you submit
  • Device information -- device name, operating system, device identifiers, and browser type
  • Account information -- name, contact details, payment information, and transaction history

This metadata creates a detailed behavioral profile for every user. Even if you never share sensitive information in a prompt, your usage patterns alone reveal significant information about your work.

The Court Order That Changed Everything

In May 2025, a federal court order (Case No. 1:23-cv-11195, S.D.N.Y.) from Magistrate Judge Ona T. Wang in the New York Times v. OpenAI copyright litigation required OpenAI to "retain and segregate all output log data that would otherwise be deleted" -- indefinitely. OpenAI's motion for reconsideration was denied on May 16, 2025.

What this means in practice: even if you deleted your chats, OpenAI may be legally required to preserve them. The 30-day deletion window described in their privacy policy is overridden by court mandate. In January 2026, District Judge Sidney Stein upheld the order, requiring OpenAI to produce a sample of 20 million de-identified user logs -- prompts and outputs -- as discovery evidence.

Your Conversations May Train Future Models

By default, conversations on consumer plans are used for model training. OpenAI's own help center documentation states: "When you use our services for individuals such as ChatGPT, Codex, and Sora, we may use your content to train our models." You can opt out via the privacy portal or through Settings > Data Controls -- but as OpenAI notes, "once you opt out, new conversations will not be used to train our models." Any data already submitted remains in the training pipeline.

Opting out also does not change how long your data is stored. It only changes whether it is used to improve models.

Enterprise vs. Individual: A Two-Tier System

OpenAI operates a clear two-tier privacy model:

Individual (Free/Plus/Pro) Enterprise/Edu
Data retention Indefinite (court-ordered) Admin-controlled
Training use Default yes (opt-out available) No default training
Court order exempt? No Yes
Admin controls None Full retention policies
Deletion timeline 30 days (when permitted) 30 days, admin-configurable

For individual users, there is no way to guarantee your data is actually deleted. For Enterprise customers, workspace administrators control retention, and data is not used for training by default.

The problem for small and mid-size teams: Enterprise plans start at significant minimums. A 10-person consulting firm cannot access Enterprise-tier privacy controls.

GDPR Compliance: An Open Question

ChatGPT's indefinite retention practices raise serious questions about GDPR compliance, particularly around data minimization and storage limitation principles. In March 2023, Italy's data protection authority (Garante per la Protezione dei Dati Personali) issued an emergency order temporarily banning ChatGPT, citing violations of GDPR Articles 5, 6, 8, 13, and 25 -- including absence of a legal basis for data collection, no privacy notice to users, and no age verification. The ban was lifted in April 2023 after OpenAI implemented changes, but the broader regulatory picture remains unsettled.

For teams handling client data in regulated industries -- legal, healthcare, financial services -- using ChatGPT means accepting OpenAI as a data processor. That triggers GDPR Article 28 obligations: you need a Data Processing Agreement (DPA), you need to document processing activities, and you need to ensure the processor meets your security requirements.

The Hidden Cost: Compliance Overhead

When your team uses ChatGPT, OpenAI becomes a data processor in your compliance chain. This means:

  1. A Data Processing Agreement (DPA) is required -- defining how OpenAI handles personal data on your behalf, specifying security measures, sub-processing limits, and breach notification duties.
  2. GDPR Article 28 obligations activate -- processors must only process data per your instructions, ensure confidentiality, implement security measures, notify breaches, and allow audits.
  3. Your security review must include OpenAI -- every vendor risk assessment, every SOC 2 audit question, every client security questionnaire now has an additional dependency.

For a 20-person law firm or healthcare practice, this compliance overhead can be more expensive than the subscription itself.

What BYOK Architecture Changes

Bring Your Own Key (BYOK) is an architectural pattern where the AI tool never touches your data. Instead:

  1. You provide your own API key from the AI provider (OpenAI, Anthropic, Google, etc.)
  2. Queries route directly from your device to the provider -- the tool is never an intermediary
  3. The tool stores nothing -- no conversations, no metadata, no behavioral profiles
  4. No DPA is required with the tool vendor -- because it is not a data processor

The data flow difference is fundamental:

ChatGPT (Cloud-Hosted) BYOK Architecture
Data path You > OpenAI servers > Model > OpenAI > You You > Provider API directly > You
Intermediary OpenAI handles all queries None -- direct API calls
Data visibility Platform logs and sees all queries Provider sees API call only
Storage On OpenAI's servers, indefinitely Local device only
Tool vendor as processor? Yes No

With BYOK, your compliance relationship is only with the AI provider you choose, on terms you negotiate directly. The tool itself is invisible to your compliance chain.

What This Means for Your Team

A 2026 survey of 2,600 privacy and security professionals found that 64% worry about inadvertently sharing sensitive data through generative AI tools -- yet roughly half admit to inputting personal or non-public data anyway. The gap between concern and behavior is where risk lives.

If you are responsible for AI governance at your organization, here are the questions to ask:

  1. Is your AI tool a data processor? If yes, you need a DPA, compliance documentation, and ongoing vendor risk assessment.
  2. Where does conversation data reside? Cloud-hosted means the vendor's servers. BYOK with local-first storage means your devices.
  3. Can you guarantee deletion? With ChatGPT's court-ordered retention, the answer is currently no for consumer plans.
  4. Does your team have consistent tooling? Individual subscriptions across 20 people means 20 separate compliance relationships.
  5. What is the total cost? Include not just subscription fees, but compliance overhead, DPA negotiation, and vendor risk assessment time.

The Bottom Line

ChatGPT is a powerful tool. It is also a data processor that retains your conversations indefinitely, uses them for training by default, and operates under a court order that overrides its own deletion policies.

For individuals, these tradeoffs may be acceptable. For teams handling client confidential information, patient data, financial records, or competitive intelligence, they represent real risk.

BYOK architecture eliminates the tool vendor from the compliance equation entirely. Your data never touches an intermediary. No DPA required. No GDPR Article 28 obligations with the tool vendor. No indefinite retention of your team's conversations on a third party's servers.

The choice is not between AI and no AI. It is between AI with governance and AI without it.


PebbleFlow uses BYOK architecture with local-first storage. Your conversations never touch our servers. Learn more about our privacy architecture or get started for free.