Your WordPress database knows more about your visitors than you do. Every comment stores an IP address, a user agent string, a name, an email, and a URL — indefinitely, by default. WooCommerce ships with every data retention field blank, which does not mean “nothing” — it means “forever.” Gravity Forms logs IP addresses on every submission unless you explicitly tell it not to. And that Action Scheduler table you have never opened? On production WooCommerce stores, it routinely balloons to tens of gigabytes, with the args column quietly holding email addresses and user IDs from years of completed scheduled tasks.
This is not a theoretical compliance risk. This is what regulators now penalize. In December 2024, the Irish Data Protection Commission fined Meta €251 million — with €110 million of that specifically for Article 25(2) violations, the operational expression of data minimization as a default. In 2019, the Danish data authority recommended a fine against a taxi company for retaining customer phone numbers just three years beyond their purpose. Not credit card numbers. Not medical records. Phone numbers. One field, one violation, one fine.
Data minimization is not about doing less. It is about knowing exactly what you collect, why you collect it, and when it should stop existing. And on WordPress, the gap between what site owners think they store and what their databases actually contain is where regulatory exposure lives.
What the law actually says and why the exact words matter
GDPR Article 5(1)(c) requires that personal data be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.” That phrasing replaced the weaker Directive 95/46/EC standard of data not being “excessive.” The shift from “not excessive” to “limited to what is necessary” was deliberate — it placed the burden on controllers to justify every data point they hold, not merely to avoid obvious overcollection.
Recital 39 sharpens this further: the storage period must be “limited to a strict minimum,” and data should only be processed if the processing purpose “could not reasonably be fulfilled by other means.” Article 25(2) then connects this directly to system design, requiring that by default — not by configuration, not by opt-in, but by default — only personal data necessary for each specific purpose are processed. That obligation covers the amount collected, the extent of processing, the storage period, and accessibility. The EDPB’s Guidelines 4/2019 on Article 25 (paragraphs 73-74) instruct controllers to “first of all determine whether they even need to process personal data for their relevant purposes.”
The ICO distils the principle into three tests: data must be adequate (sufficient to fulfil the purpose), relevant (rationally linked to it), and limited to what is necessary (no more than needed). Holding data that fails any of these tests is, in the ICO’s words, “likely to be unlawful.”
This is not limited to Europe. The California Privacy Rights Act, effective January 1, 2023, introduced the first explicit data minimization requirement in US law. Section 1798.100(c) requires that a business’s collection, use, retention, and sharing of personal information be “reasonably necessary and proportionate” to the purposes for which it was collected. The California Privacy Protection Agency’s Enforcement Advisory No. 2024-01 called data minimization “a foundational principle in the CCPA.” Virginia’s CDPA, Colorado’s CPA, Connecticut’s CTDPA, and Maryland’s Online Data Privacy Act all impose similar requirements. As of January 2026, nineteen US states have comprehensive privacy laws in effect, with no federal law in sight — the American Privacy Rights Act expired without reintroduction in January 2025. Maryland’s 2025 law is particularly aggressive, banning the sale of sensitive personal data outright and naming data minimization as a 2026 enforcement priority.
Brazil’s LGPD embeds the same concept through Article 6 — adequacy (6(II)) and necessity (6(III)), requiring “limitation of the processing to the minimum necessary.” The CJEU reinforced minimization in binding case law in C-252/21 (Meta Platforms), holding that the principle precludes aggregating data obtained on or outside a platform for targeted advertising “without restriction as to time.”
The legal convergence across jurisdictions is unmistakable. Minimization is not a European peculiarity. It is becoming the global default.
The enforcement record proves this is not theoretical
The fines are real, they are large, and the violations they target are precisely the kind of passive overcollection that WordPress sites exhibit by default.
H&M received a €35.3 million fine from the Hamburg Commissioner for Data Protection in October 2020. Since at least 2014, H&M managers had been conducting “Welcome Back Talks” after employee absences, recording vacation experiences, illness symptoms, family problems, and religious beliefs in a digital system accessible to approximately 50 managers. The violation was not data theft or a breach. It was collecting data that went far beyond what employment management required.
The Meta €251 million decision in December 2024 is particularly instructive for anyone building or managing systems. The Irish DPC imposed €130 million for Article 25(1) and €110 million for Article 25(2) following the 2018 Facebook breach affecting 29 million accounts. The compromised data included names, emails, phone numbers, locations, dates of birth, religion, gender, timeline posts, group memberships, and children’s data. The DPC’s position was that Meta failed to ensure that, by default, only necessary data was processed. The system design itself was the violation.
Amazon France Logistique was fined €32 million by CNIL in January 2024 (reduced to €15 million on appeal) for warehouse employee monitoring via scanners that recorded granular quality, productivity, and idle-time data at the individual level, retained for 31 days. CNIL found that aggregated weekly data would have served the same management purpose. The French Council of State upheld the data minimization violation as “central to justifying the administrative fine.”
Clearview AI accumulated €20 million from CNIL (October 2022) and €30.5 million from the Dutch DPA (May 2024) for scraping over 30 billion photographs. The Dutch authority’s statement was blunt: Clearview “should never have built the database.” SAF Logistics received €200,000 from CNIL in 2023 for excessive employee data collection during recruitment. And in early 2026, CNIL fined Free Mobile €27 million for data retention practices violating minimization and storage limitation principles.
The pattern across these cases is consistent: regulators penalize not just the breach, but the existence of data that should never have been collected or retained in the first place. The WordPress parallel is direct. If your database holds commenter IP addresses from 2019 that serve no current moderation purpose, you are holding data that fails the necessity test.
What your WordPress database actually contains
Most WordPress site owners have never queried their own database. If they did, the volume of personal data stored by default would be sobering.
The wp_users table holds personal data in eight of its ten columns: user_login, user_pass (hashed), user_nicename, user_email, user_url, user_registered, user_activation_key, and display_name. The wp_usermeta table extends this with first_name, last_name, nickname, description, and the session_tokens meta key — a serialized array containing the IP address, user agent string, login timestamp, and session expiration for every active session. Here is the part that trips up site owners: expired session tokens are not automatically pruned. They persist in wp_usermeta until the user logs in again.
The wp_comments table stores five personal data fields per comment: comment_author, comment_author_email, comment_author_url, comment_author_IP, and comment_agent. All of these are stored indefinitely by default. A site with 2,000 comments is a site with 2,000 IP addresses and 2,000 browser fingerprints in a table that nobody audits.
WordPress 4.9.6 (May 2018) introduced privacy tools specifically for GDPR. The Personal Data Export tool (Tools → Export Personal Data) uses the wp_privacy_personal_data_exporters filter, where core registers exporters for user profile data, community events location, session tokens, and comments. The Personal Data Erasure tool uses wp_privacy_personal_data_erasers — and here is a critical nuance that most site owners miss: it does not delete comments. It anonymises them, setting comment_author to “Anonymous,” emptying email and URL fields, zeroing the last octet of IPv4 addresses via wp_privacy_anonymize_ip(), and clearing the user agent. The wp_privacy_anonymize_data() function handles type-specific replacements: deleted@site.invalid for emails, https://site.invalid for URLs, 0000-00-00 00:00:00 for dates, [deleted] for text.
These tools exist. But they are reactive — they respond to individual requests. They do nothing about the structural accumulation of personal data across the rest of your database.
Plugins are where minimization risk concentrates
Every plugin makes independent decisions about what to collect, where to store it, and how long to keep it. No centralised governance exists. The result is that a typical WordPress installation with ten active plugins may have ten separate data retention policies, most of which are “forever.”
Contact Form 7 is often praised for its privacy posture because it does not store submissions in the database — it emails them. But its companion plugin Flamingo stores every submission as a custom post type in wp_posts and wp_postmeta, indefinitely, with no built-in auto-delete and no integration with WordPress’s core Export/Erase Personal Data tools. WPForms stores entries in custom tables (wp_wpforms_entries, wp_wpforms_entry_meta, wp_wpforms_entry_fields) and by default collects user IP addresses, user agents, and referring URLs. These defaults are on. You must navigate to Settings → General and explicitly disable them. Gravity Forms is the most mature on this front: its per-form Personal Data tab offers configurable retention that automatically trashes or deletes entries after a set number of days, an IP storage toggle, and full integration with WordPress’s core privacy tools via the gform_personal_data filter.
WooCommerce is where the data footprint becomes genuinely complex. Customer billing and shipping addresses live as user meta in wp_usermeta (keys like billing_first_name, billing_address_1, billing_phone). Order data sits in either the legacy CPT system or the newer High-Performance Order Storage tables (wp_wc_orders, wp_wc_order_addresses, wp_wc_orders_meta) that became the default for new stores in WooCommerce 8.2 (October 2023). Session data for guest shoppers accumulates in wp_woocommerce_sessions. Analytics data spans wp_wc_order_stats, wp_wc_order_product_lookup, and wp_wc_customer_lookup. And those data retention settings at WooCommerce → Settings → Accounts & Privacy? Five configurable fields for inactive accounts, pending orders, failed orders, cancelled orders, and completed orders. All five ship blank. Blank means indefinite. Every WooCommerce store that has not touched these settings is retaining every piece of customer data it has ever collected, with no legal basis for the retention beyond the original transaction.
Analytics plugins split between local and external storage. Matomo (self-hosted) stores full visit logs locally in wp_matomo_log_visit and wp_matomo_log_link_visit_action, including IP addresses, user agents, and geolocation — but offers configurable IP anonymisation, cookieless tracking, and retroactive anonymisation tools. Wordfence stores visitor IPs, login attempts, and firewall events in custom tables and by default transmits IP data to Defiant’s cloud servers via the Real-Time Security Network feature. Akismet sends comment data (IP, user agent, referrer, content) to Automattic’s servers for spam analysis, retaining it externally for 2-90 days while storing detection metadata locally in wp_commentmeta indefinitely.
Database bloat is silent noncompliance
Beyond the data plugins intentionally collect, WordPress databases accumulate personal data through structural mechanisms that most site owners never address.
Post revisions are stored in wp_posts with post_type = 'revision'. WordPress creates unlimited revisions by default. A site with 200 posts averaging 20 revisions holds 4,000 extra rows in wp_posts, each preserving full post_content that may contain personal data if custom post types store form submissions or user-generated content. Two lines in wp-config.php control this: define('WP_POST_REVISIONS', 5); and define('EMPTY_TRASH_DAYS', 7);.
Transients in wp_options (prefixed _transient_ and _transient_timeout_) use lazy deletion — expired transients are purged only when something requests them. If nothing requests them, they persist forever. Plugins may cache user data, API responses containing PII, or geolocation data in transients.
Action Scheduler tables — used by WooCommerce, MailPoet, WP Mail SMTP, and others — are one of the most severe bloat vectors. The wp_actionscheduler_actions and wp_actionscheduler_logs tables retain completed actions for 30 days by default, but real-world reports document these tables growing to 10-60+ GB with hundreds of millions of rows. The args column may contain email addresses, user IDs, or other PII. The action_scheduler_retention_period filter can reduce this window.
Spam comments remain in wp_comments with their full personal data payload — name, email, IP, user agent — unless manually deleted. Orphaned metadata in wp_postmeta, wp_usermeta, and wp_commentmeta referencing deleted parent records accumulates when plugins fail to register cleanup hooks or when direct SQL deletions bypass WordPress’s deletion functions.
None of this data serves an ongoing purpose. All of it constitutes personal data under GDPR. All of it fails the necessity test.
What you should actually do about it
Configure WooCommerce retention settings now if you run a store. Navigate to WooCommerce → Settings → Accounts & Privacy and populate all five fields. Completed orders should be set to 6-7 years (matching tax record requirements — 6 years UK HMRC, 7 years US IRS). Pending, failed, and cancelled orders can go to 30-90 days. Inactive accounts to 12-24 months.
Disable IP address collection where it serves no purpose. Add add_filter('pre_comment_user_ip', '__return_empty_string'); to your theme’s functions.php to stop storing commenter IPs. Disable IP and user agent collection in WPForms under Settings → General. In Gravity Forms, toggle IP storage off in each form’s Personal Data settings. In Wordfence, disable “Participate in the Real-Time Wordfence Security Network.”
Limit post revisions and clean your database. Add define('WP_POST_REVISIONS', 5); to wp-config.php. Use WP-Optimize or Advanced Database Cleaner on a weekly schedule to clear revisions, auto-drafts, trashed items, expired transients, spam comments, and orphaned metadata. For Action Scheduler bloat, add add_filter('action_scheduler_retention_period', function() { return WEEK_IN_SECONDS; });.
Audit your REST API exposure. Visit yoursite.com/wp-json/wp/v2/users while logged out. If user data is visible, restrict the endpoint using the rest_endpoints filter or WP Cerber’s REST API blocking. Be surgical — Contact Form 7, WooCommerce, the Block Editor, and Jetpack all depend on REST API access.
Run a data discovery exercise. Use WordPress’s built-in Tools → Export Personal Data with a test email to see what every installed plugin reports. Run wp db search "user@email.com" via WP-CLI to find where personal data appears across the database. Query SELECT DISTINCT meta_key FROM wp_usermeta; to identify all metadata keys and assess which contain personal data.
Set reasonable retention schedules: contact form submissions at 6-12 months, server access logs at 30-90 days, analytics data at 14-26 months, comment IP addresses anonymised after 90 days.
What developers building for WordPress must get right
The WordPress.org Plugin Review Team enforces Guideline 7, which prohibits plugins from contacting external servers without explicit and authorised consent via an opt-in method. Documentation on data collection must appear in the plugin’s readme with a clearly stated privacy policy. Prohibited practices include automated collection without explicit confirmation and misleading users into submitting information as a prerequisite for plugin use. Premium plugins distributed outside WordPress.org are not bound by these repository rules — an enforcement gap worth noting.
Developers handling personal data should implement the WordPress Privacy API: register data exporters via wp_privacy_personal_data_exporters, register erasers via wp_privacy_personal_data_erasers with batch processing in groups of 500, and contribute suggested privacy policy text via wp_add_privacy_policy_content() on the admin_init action. Use wp_delete_user(), wp_delete_post(), and wp_delete_comment() instead of direct SQL — these functions fire cleanup hooks that remove associated metadata. Register uninstall.php or use register_uninstall_hook() to clean up all plugin data on removal.
Loading Google Fonts from Google’s servers triggered a €100 fine per page load from a Munich court in January 2022. Third-party resource loading constitutes personal data transfer. Self-host your fonts, disclose all external requests, and declare all cookies your theme sets.
The trajectory is unambiguous
CalPrivacy’s largest settlement reached $2.75 million in February 2026. Texas established itself as a major enforcement jurisdiction with an over $1 billion settlement. The UK’s Data (Use and Access) Act 2025 increased PECR fines to up to £17.5 million or 4% of global turnover. Global Privacy Control signals are now legally mandated in over ten US states.
The direction is clear across every jurisdiction: regulators are moving from penalising breaches to penalising the conditions that make breaches consequential. Collecting less data inherently reduces multi-jurisdictional compliance risk. Every plugin that stores personal data creates a regulatory surface. Every blank retention setting represents an indefinite storage commitment that regulators increasingly view as a violation in itself.
Your WordPress database is not a neutral archive. It is a liability ledger. Every row of personal data you hold without a defined purpose and a defined expiry is a row that exists in violation of a legal principle now codified across dozens of jurisdictions. The Danish taxi company learned this over phone numbers. Meta learned it at a cost of €251 million. The question is not whether data minimization applies to your WordPress site. It is how much unnecessary data your database has already accumulated while you were not looking.
— Comments 0
No comments yet. Be the first to share your opinion!
Comments are closed for this post.