Data Protection Impact Assessment (DPIA)

GDPR Article 35 | Version 1.1 | March 2026

⚖️ Legal Notice: This DPIA is published in compliance with GDPR Article 35(7)(c), which requires data controllers to seek views from data subjects where appropriate. We welcome feedback at our contact form.

1. Processing Operation Description

1.1 Purpose

HumanKey provides AI bot detection and traffic classification services for website publishers. The processing systematically monitors website visitors to distinguish between human users and automated agents (AI crawlers, scrapers, bots) without user interaction or consent.

1.2 Data Processed

Data Type	Format	Personal Data?
IP Address	SHA-256 hash with daily rotating salt	Pseudonymized
User-Agent String	Truncated to 200 characters	Reduced PII
Page URL	Full URL (excluding query params with PII)	Potentially personal
Referrer URL	Full referrer string	Potentially personal
Visit Timestamp	ISO 8601 UTC	Yes (temporal pattern)
Session Duration	Milliseconds	Non-personal
Behavioral Signals	Scroll depth, JS execution timing, page dwell time (aggregated session-level signals, no keylogging)	Pseudonymized (session-scoped)
AI Crawler Identity	Bot name, company, purpose (e.g. GPTBot / OpenAI / training)	Non-personal (automated agent data)

1.3 Processing Methods

User-Agent Matching: Comparison against 50+ known AI bot signatures (GPTBot, ClaudeBot, Perplexity, Google-Extended, etc.)
IP Datacenter Detection: Lookup against public datacenter IP ranges (AWS, Google Cloud, Azure, OVH)
Behavioral Signals: Scroll depth, JS execution timing, page dwell time, navigation sequences — used to distinguish human interaction from automated scripts
Session Analysis: Page view duration, navigation sequences, dwell time patterns
Confidence Scoring: Algorithmic scoring (0-100%) based on combined signal strength — no automated blocking decisions made solely on score
AI Crawler Taxonomy: Classification of crawlers by company, purpose and permission level (GPTBot/OpenAI/training, ClaudeBot/Anthropic/training, Googlebot/Google/indexing, etc.)

2. Legal Basis & Necessity

2.1 Legal Basis (GDPR Art. 6)

Legitimate Interest (Art. 6(1)(f)): Website publishers have a legitimate interest in:

Protecting their content from unauthorized scraping and copyright infringement
Measuring accurate traffic metrics (excluding bots from analytics)
Preventing server overload from aggressive crawlers
Monetizing AI training data usage (licensing to model providers)

2.2 Balancing Test

Consideration	Assessment
Publisher Interest	High — Content protection, accurate analytics, revenue attribution
User Impact	Low — No visible impact, no decisions made about humans, no discrimination
Data Sensitivity	Low — IP hashed, UA truncated, no special category data (Art. 9)
Reasonable Expectations	Met — Bot detection is standard practice, disclosed in privacy policies

Conclusion: The legitimate interest of protecting website content and measuring traffic outweighs the minimal privacy impact on users. Processing is proportionate and necessary.

2.3 Why Not Consent?

Consent (Art. 6(1)(a)) is not suitable for bot detection because:

Bots do not provide consent (cannot tick checkboxes or interact with banners)
Requiring consent would defeat the purpose (scrapers would simply decline and continue scraping)
Server-side processing occurs before page load (no opportunity to request consent)

3. Risk Assessment

3.1 Identified Risks

✅ LOW RISK: False Positive (Human Classified as Bot)

Impact: Human user incorrectly flagged as bot → no functional impact (users still access content), statistical error only

Likelihood: Low (confidence threshold tuned to 85%+)

Mitigation: No blocking decisions made based on classification; publishers can manually review edge cases

✅ LOW RISK: Data Breach (Hashed IPs Exposed)

Impact: SHA-256 hashes with daily salt exposed → computationally infeasible to reverse, expires in 24h

Likelihood: Low (encrypted infrastructure, AES-256 at rest, TLS in transit)

Mitigation: TLS in transit, AES-256 at rest (Neon default), access controls, annual penetration testing

⚠️ MEDIUM RISK: URL Privacy Leakage

Impact: URLs containing session tokens or personal identifiers stored in analytics → potential re-identification

Likelihood: Medium (some publishers may use URL params for user IDs)

Mitigation: (1) Publisher documentation warns against passing PII in URLs, (2) Future enhancement: automatic query param stripping for common patterns (?user_id=, ?email=)

✅ LOW RISK: Third-Party Sub-Processor Access

Impact: Sub-processors (Neon, Railway, Stripe) access pseudonymized data

Likelihood: Inherent (sub-processors required for service delivery)

Mitigation: DPAs with all processors, EU data residency (Neon Germany/Frankfurt, Railway Netherlands), SCCs for US transfers, enterprise-grade certified vendors only

3.2 Overall Risk Rating

🟡 MEDIUM RISK — (elevated from LOW due to email delivery via US sub-processor and Google OAuth data flows)

No high-risk automated decision-making (Art. 22), no special category data (Art. 9), no large-scale monitoring of publicly accessible areas, no vulnerable data subjects. Risk elevation is due to transactional email processing by Resend, Inc. (USA) and Google OAuth authentication flows, both covered by SCCs.

4. Safeguards & Measures

4.1 Privacy by Design (GDPR Art. 25)

Pseudonymization: IP addresses hashed with SHA-256 + daily salt before storage (Art. 32(1)(a))
Data Minimization: User-Agent truncated to 200 chars (browser family only, not full fingerprint)
Storage Limitation: Automatic deletion after retention period (Free: 7d, Pro: 30d, Business: 90d, Enterprise: 365d)
Purpose Limitation: Data used only for bot detection, not sold or used for marketing
Transparency: Processing disclosed in publisher privacy policies + HumanKey privacy policy

4.2 Security Measures (GDPR Art. 32)

Measure	Implementation
Encryption in transit	TLS 1.3, HSTS preload, certificate pinning
Encryption at rest	AES-256 (Neon PostgreSQL default)
Access control	Row-level security (RLS), JWT auth, bcrypt password hashing (12 rounds)
Rate limiting	Redis-backed, mandatory in production (200 req/min/IP for detection endpoint)
Monitoring	Sentry error tracking (PII stripped), structured logging (Pino), admin audit log
Backups	Daily automated backups (Neon), 30-day retention, encrypted

4.3 Data Subject Rights (GDPR Art. 15-22)

Right	Implementation
Right to Access (Art. 15)	Dashboard → Settings → Export Data (JSON download)
Right to Erasure (Art. 17)	Dashboard → Settings → Delete Account (cascading deletion + audit log)
Right to Rectification (Art. 16)	Dashboard → Settings → Update Profile (name, email)
Right to Object (Art. 21)	Contact form (Privacy & GDPR dept., manual review, 30-day response)
Right to Data Portability (Art. 20)	Export includes all visit data in machine-readable JSON format

4.4 Sub-Processors (GDPR Art. 28)

Sub-Processor	Purpose	Transfer Basis
Neon Inc.	PostgreSQL database hosting (primary data store)	EU — Frankfurt, Germany
Railway Corp.	API server hosting and infrastructure	EU — Netherlands
Vercel Inc.	Frontend hosting and CDN	USA — SCCs applied
Sentry (Functional Software Inc.)	Error monitoring and performance tracking (PII-stripped)	USA — SCCs applied
Resend, Inc.	Transactional email delivery (verification, notifications)	USA — SCCs applied
Stripe, Inc.	Payment processing (acts as independent controller for card data)	USA — SCCs + DPF

Additional Processing Activities

Authentication & Account Data

Data processed: Email address, hashed password (bcrypt), Google OAuth user ID and email, email verification tokens, magic link tokens.
Legal basis: Art. 6(1)(b) GDPR — processing necessary for the performance of a contract.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of active account; deleted within 30 days of account closure.

Email Communications via Resend

Data processed: Email address, email content (verification codes, password reset links, onboarding notifications).
Legal basis: Art. 6(1)(b) GDPR — necessary for account functionality and service delivery.
Sub-processor: Resend, Inc. (USA) — EU Standard Contractual Clauses applied.
Retention: Email delivery logs retained 90 days by Resend; email addresses stored in HumanKey DB for the account lifetime.

Team Member Data (Business tier)

Data processed: Email addresses of invited team members, invitation tokens, team role assignments.
Legal basis: Art. 6(1)(b) GDPR — necessary for multi-user account management.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of team membership; deleted upon team removal or account closure.

Session Recording Data (Beta)

Data processed: If enabled by the Controller, records click events and scroll interactions on the Controller's website pages. No keystroke data is captured. Session recordings are stored only for sites where the Controller explicitly enables this feature.
Legal basis: Art. 6(1)(f) GDPR — legitimate interest of the Controller in understanding user experience, subject to user opt-in mechanism.
Note: This is a Beta feature, currently disabled by default. Controllers bear responsibility for informing their own users of session recording per Art. 13/14 GDPR.
Retention: 30 days (Pro), 90 days (Business), 365 days (Enterprise).

5. Consultation & Approval

5.1 Data Protection Officer (DPO)

No DPO appointed (not required — Art. 37 thresholds not met). Privacy queries handled by: our contact form

5.2 Supervisory Authority

Polish supervisory authority: Urząd Ochrony Danych Osobowych (UODO)
Website: uodo.gov.pl

5.3 Data Subject Views

This DPIA is publicly available at humankey.io/legal/dpia per Art. 35(9). We invite feedback from data subjects, civil society organizations, and privacy advocates. Contact: our contact form

6. Review & Maintenance

Initial Version	1.0 (February 2026) → 1.1 (March 2026)
Review Frequency	Annual (every February) or when material changes occur
Next Review Date	February 2027
Responsible Person	Tymoteusz (Founder, Data Controller)

Triggers for Re-Assessment: New data types collected, change in legal basis, data breach, supervisory authority inquiry, technology upgrade affecting privacy, or complaints from data subjects.

7. Conclusion

✅ DPIA OUTCOME: Processing is GDPR-compliant and may proceed.

Risk to data subjects: LOW
Legitimate interest: JUSTIFIED (content protection, traffic analytics)
Safeguards: ADEQUATE (pseudonymization, encryption, data minimization, retention limits)
Rights compliance: IMPLEMENTED (access, erasure, portability, rectification)
Transparency: ACHIEVED (privacy policy, DPIA published, sub-processors disclosed)

Contact: Questions or objections regarding this DPIA? Email our contact form or write to the Polish supervisory authority (UODO).

This DPIA complies with GDPR Article 35, ICO guidelines, and WP29 Guidelines on DPIAs (wp248rev.01). Version 1.1 approved by Data Controller: Tymoteusz (March 2026).