Data Protection Impact Assessment (DPIA)
GDPR Article 35 | Version 1.1 | March 2026
⚖️ Legal Notice: This DPIA is published in compliance with GDPR Article 35(7)(c), which requires data controllers to seek views from data subjects where appropriate. We welcome feedback at our contact form.
1. Processing Operation Description
1.1 Purpose
HumanKey provides AI bot detection and traffic classification services for website publishers. The processing systematically monitors website visitors to distinguish between human users and automated agents (AI crawlers, scrapers, bots) without user interaction or consent.
1.2 Data Processed
| Data Type | Format | Personal Data? |
|---|---|---|
| IP Address | SHA-256 hash with daily rotating salt | Pseudonymized |
| User-Agent String | Truncated to 200 characters | Reduced PII |
| Page URL | Full URL (excluding query params with PII) | Potentially personal |
| Referrer URL | Full referrer string | Potentially personal |
| Visit Timestamp | ISO 8601 UTC | Yes (temporal pattern) |
| Session Duration | Milliseconds | Non-personal |
| Behavioral Signals | Scroll depth, JS execution timing, page dwell time (aggregated session-level signals, no keylogging) | Pseudonymized (session-scoped) |
| AI Crawler Identity | Bot name, company, purpose (e.g. GPTBot / OpenAI / training) | Non-personal (automated agent data) |
1.3 Processing Methods
- User-Agent Matching: Comparison against 50+ known AI bot signatures (GPTBot, ClaudeBot, Perplexity, Google-Extended, etc.)
- IP Datacenter Detection: Lookup against public datacenter IP ranges (AWS, Google Cloud, Azure, OVH)
- Behavioral Signals: Scroll depth, JS execution timing, page dwell time, navigation sequences — used to distinguish human interaction from automated scripts
- Session Analysis: Page view duration, navigation sequences, dwell time patterns
- Confidence Scoring: Algorithmic scoring (0-100%) based on combined signal strength — no automated blocking decisions made solely on score
- AI Crawler Taxonomy: Classification of crawlers by company, purpose and permission level (GPTBot/OpenAI/training, ClaudeBot/Anthropic/training, Googlebot/Google/indexing, etc.)
2. Legal Basis & Necessity
2.1 Legal Basis (GDPR Art. 6)
Legitimate Interest (Art. 6(1)(f)): Website publishers have a legitimate interest in:
- Protecting their content from unauthorized scraping and copyright infringement
- Measuring accurate traffic metrics (excluding bots from analytics)
- Preventing server overload from aggressive crawlers
- Monetizing AI training data usage (licensing to model providers)
2.2 Balancing Test
| Consideration | Assessment |
|---|---|
| Publisher Interest | High — Content protection, accurate analytics, revenue attribution |
| User Impact | Low — No visible impact, no decisions made about humans, no discrimination |
| Data Sensitivity | Low — IP hashed, UA truncated, no special category data (Art. 9) |
| Reasonable Expectations | Met — Bot detection is standard practice, disclosed in privacy policies |
Conclusion: The legitimate interest of protecting website content and measuring traffic outweighs the minimal privacy impact on users. Processing is proportionate and necessary.
2.3 Why Not Consent?
Consent (Art. 6(1)(a)) is not suitable for bot detection because:
- Bots do not provide consent (cannot tick checkboxes or interact with banners)
- Requiring consent would defeat the purpose (scrapers would simply decline and continue scraping)
- Server-side processing occurs before page load (no opportunity to request consent)
3. Risk Assessment
3.1 Identified Risks
✅ LOW RISK: False Positive (Human Classified as Bot)
Impact: Human user incorrectly flagged as bot → no functional impact (users still access content), statistical error only
Likelihood: Low (confidence threshold tuned to 85%+)
Mitigation: No blocking decisions made based on classification; publishers can manually review edge cases
✅ LOW RISK: Data Breach (Hashed IPs Exposed)
Impact: SHA-256 hashes with daily salt exposed → computationally infeasible to reverse, expires in 24h
Likelihood: Low (encrypted infrastructure, AES-256 at rest, TLS in transit)
Mitigation: TLS in transit, AES-256 at rest (Neon default), access controls, annual penetration testing
⚠️ MEDIUM RISK: URL Privacy Leakage
Impact: URLs containing session tokens or personal identifiers stored in analytics → potential re-identification
Likelihood: Medium (some publishers may use URL params for user IDs)
Mitigation: (1) Publisher documentation warns against passing PII in URLs, (2) Future enhancement: automatic query param stripping for common patterns (?user_id=, ?email=)
✅ LOW RISK: Third-Party Sub-Processor Access
Impact: Sub-processors (Neon, Railway, Stripe) access pseudonymized data
Likelihood: Inherent (sub-processors required for service delivery)
Mitigation: DPAs with all processors, EU data residency (Neon Germany/Frankfurt, Railway Netherlands), SCCs for US transfers, enterprise-grade certified vendors only
3.2 Overall Risk Rating
🟡 MEDIUM RISK — (elevated from LOW due to email delivery via US sub-processor and Google OAuth data flows)
No high-risk automated decision-making (Art. 22), no special category data (Art. 9), no large-scale monitoring of publicly accessible areas, no vulnerable data subjects. Risk elevation is due to transactional email processing by Resend, Inc. (USA) and Google OAuth authentication flows, both covered by SCCs.
4. Safeguards & Measures
4.1 Privacy by Design (GDPR Art. 25)
- Pseudonymization: IP addresses hashed with SHA-256 + daily salt before storage (Art. 32(1)(a))
- Data Minimization: User-Agent truncated to 200 chars (browser family only, not full fingerprint)
- Storage Limitation: Automatic deletion after retention period (Free: 7d, Pro: 30d, Business: 90d, Enterprise: 365d)
- Purpose Limitation: Data used only for bot detection, not sold or used for marketing
- Transparency: Processing disclosed in publisher privacy policies + HumanKey privacy policy
4.2 Security Measures (GDPR Art. 32)
| Measure | Implementation |
|---|---|
| Encryption in transit | TLS 1.3, HSTS preload, certificate pinning |
| Encryption at rest | AES-256 (Neon PostgreSQL default) |
| Access control | Row-level security (RLS), JWT auth, bcrypt password hashing (12 rounds) |
| Rate limiting | Redis-backed, mandatory in production (200 req/min/IP for detection endpoint) |
| Monitoring | Sentry error tracking (PII stripped), structured logging (Pino), admin audit log |
| Backups | Daily automated backups (Neon), 30-day retention, encrypted |
4.3 Data Subject Rights (GDPR Art. 15-22)
| Right | Implementation |
|---|---|
| Right to Access (Art. 15) | Dashboard → Settings → Export Data (JSON download) |
| Right to Erasure (Art. 17) | Dashboard → Settings → Delete Account (cascading deletion + audit log) |
| Right to Rectification (Art. 16) | Dashboard → Settings → Update Profile (name, email) |
| Right to Object (Art. 21) | Contact form (Privacy & GDPR dept., manual review, 30-day response) |
| Right to Data Portability (Art. 20) | Export includes all visit data in machine-readable JSON format |
4.4 Sub-Processors (GDPR Art. 28)
| Sub-Processor | Purpose | Transfer Basis |
|---|---|---|
| Neon Inc. | PostgreSQL database hosting (primary data store) | EU — Frankfurt, Germany |
| Railway Corp. | API server hosting and infrastructure | EU — Netherlands |
| Vercel Inc. | Frontend hosting and CDN | USA — SCCs applied |
| Sentry (Functional Software Inc.) | Error monitoring and performance tracking (PII-stripped) | USA — SCCs applied |
| Resend, Inc. | Transactional email delivery (verification, notifications) | USA — SCCs applied |
| Stripe, Inc. | Payment processing (acts as independent controller for card data) | USA — SCCs + DPF |
Additional Processing Activities
Authentication & Account Data
Data processed: Email address, hashed password (bcrypt), Google OAuth user ID and email, email verification tokens, magic link tokens.
Legal basis: Art. 6(1)(b) GDPR — processing necessary for the performance of a contract.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of active account; deleted within 30 days of account closure.
Email Communications via Resend
Data processed: Email address, email content (verification codes, password reset links, onboarding notifications).
Legal basis: Art. 6(1)(b) GDPR — necessary for account functionality and service delivery.
Sub-processor: Resend, Inc. (USA) — EU Standard Contractual Clauses applied.
Retention: Email delivery logs retained 90 days by Resend; email addresses stored in HumanKey DB for the account lifetime.
Team Member Data (Business tier)
Data processed: Email addresses of invited team members, invitation tokens, team role assignments.
Legal basis: Art. 6(1)(b) GDPR — necessary for multi-user account management.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of team membership; deleted upon team removal or account closure.
Session Recording Data (Beta)
Data processed: If enabled by the Controller, records click events and scroll interactions on the Controller's website pages. No keystroke data is captured. Session recordings are stored only for sites where the Controller explicitly enables this feature.
Legal basis: Art. 6(1)(f) GDPR — legitimate interest of the Controller in understanding user experience, subject to user opt-in mechanism.
Note: This is a Beta feature, currently disabled by default. Controllers bear responsibility for informing their own users of session recording per Art. 13/14 GDPR.
Retention: 30 days (Pro), 90 days (Business), 365 days (Enterprise).
5. Consultation & Approval
5.1 Data Protection Officer (DPO)
No DPO appointed (not required — Art. 37 thresholds not met). Privacy queries handled by: our contact form
5.2 Supervisory Authority
Polish supervisory authority: Urząd Ochrony Danych Osobowych (UODO)
Website: uodo.gov.pl
5.3 Data Subject Views
This DPIA is publicly available at humankey.io/legal/dpia per Art. 35(9). We invite feedback from data subjects, civil society organizations, and privacy advocates. Contact: our contact form
6. Review & Maintenance
| Initial Version | 1.0 (February 2026) → 1.1 (March 2026) |
| Review Frequency | Annual (every February) or when material changes occur |
| Next Review Date | February 2027 |
| Responsible Person | Tymoteusz (Founder, Data Controller) |
Triggers for Re-Assessment: New data types collected, change in legal basis, data breach, supervisory authority inquiry, technology upgrade affecting privacy, or complaints from data subjects.
7. Conclusion
✅ DPIA OUTCOME: Processing is GDPR-compliant and may proceed.
- Risk to data subjects: LOW
- Legitimate interest: JUSTIFIED (content protection, traffic analytics)
- Safeguards: ADEQUATE (pseudonymization, encryption, data minimization, retention limits)
- Rights compliance: IMPLEMENTED (access, erasure, portability, rectification)
- Transparency: ACHIEVED (privacy policy, DPIA published, sub-processors disclosed)
Contact: Questions or objections regarding this DPIA? Email our contact form or write to the Polish supervisory authority (UODO).
This DPIA complies with GDPR Article 35, ICO guidelines, and WP29 Guidelines on DPIAs (wp248rev.01). Version 1.1 approved by Data Controller: Tymoteusz (March 2026).