Skip to main content

Data Protection Impact Assessment (DPIA)

GDPR Article 35 | Version 1.1 | March 2026

⚖️ Legal Notice: This DPIA is published in compliance with GDPR Article 35(7)(c), which requires data controllers to seek views from data subjects where appropriate. We welcome feedback at our contact form.

1. Processing Operation Description

1.1 Purpose

HumanKey provides AI bot detection and traffic classification services for website publishers. The processing systematically monitors website visitors to distinguish between human users and automated agents (AI crawlers, scrapers, bots) without user interaction or consent.

1.2 Data Processed

Data TypeFormatPersonal Data?
IP AddressSHA-256 hash with daily rotating saltPseudonymized
User-Agent StringTruncated to 200 charactersReduced PII
Page URLFull URL (excluding query params with PII)Potentially personal
Referrer URLFull referrer stringPotentially personal
Visit TimestampISO 8601 UTCYes (temporal pattern)
Session DurationMillisecondsNon-personal
Behavioral SignalsScroll depth, JS execution timing, page dwell time (aggregated session-level signals, no keylogging)Pseudonymized (session-scoped)
AI Crawler IdentityBot name, company, purpose (e.g. GPTBot / OpenAI / training)Non-personal (automated agent data)

1.3 Processing Methods

  • User-Agent Matching: Comparison against 50+ known AI bot signatures (GPTBot, ClaudeBot, Perplexity, Google-Extended, etc.)
  • IP Datacenter Detection: Lookup against public datacenter IP ranges (AWS, Google Cloud, Azure, OVH)
  • Behavioral Signals: Scroll depth, JS execution timing, page dwell time, navigation sequences — used to distinguish human interaction from automated scripts
  • Session Analysis: Page view duration, navigation sequences, dwell time patterns
  • Confidence Scoring: Algorithmic scoring (0-100%) based on combined signal strength — no automated blocking decisions made solely on score
  • AI Crawler Taxonomy: Classification of crawlers by company, purpose and permission level (GPTBot/OpenAI/training, ClaudeBot/Anthropic/training, Googlebot/Google/indexing, etc.)

2. Legal Basis & Necessity

2.1 Legal Basis (GDPR Art. 6)

Legitimate Interest (Art. 6(1)(f)): Website publishers have a legitimate interest in:

  • Protecting their content from unauthorized scraping and copyright infringement
  • Measuring accurate traffic metrics (excluding bots from analytics)
  • Preventing server overload from aggressive crawlers
  • Monetizing AI training data usage (licensing to model providers)

2.2 Balancing Test

ConsiderationAssessment
Publisher InterestHigh — Content protection, accurate analytics, revenue attribution
User ImpactLow — No visible impact, no decisions made about humans, no discrimination
Data SensitivityLow — IP hashed, UA truncated, no special category data (Art. 9)
Reasonable ExpectationsMet — Bot detection is standard practice, disclosed in privacy policies

Conclusion: The legitimate interest of protecting website content and measuring traffic outweighs the minimal privacy impact on users. Processing is proportionate and necessary.

2.3 Why Not Consent?

Consent (Art. 6(1)(a)) is not suitable for bot detection because:

  • Bots do not provide consent (cannot tick checkboxes or interact with banners)
  • Requiring consent would defeat the purpose (scrapers would simply decline and continue scraping)
  • Server-side processing occurs before page load (no opportunity to request consent)

3. Risk Assessment

3.1 Identified Risks

✅ LOW RISK: False Positive (Human Classified as Bot)

Impact: Human user incorrectly flagged as bot → no functional impact (users still access content), statistical error only

Likelihood: Low (confidence threshold tuned to 85%+)

Mitigation: No blocking decisions made based on classification; publishers can manually review edge cases

✅ LOW RISK: Data Breach (Hashed IPs Exposed)

Impact: SHA-256 hashes with daily salt exposed → computationally infeasible to reverse, expires in 24h

Likelihood: Low (encrypted infrastructure, AES-256 at rest, TLS in transit)

Mitigation: TLS in transit, AES-256 at rest (Neon default), access controls, annual penetration testing

⚠️ MEDIUM RISK: URL Privacy Leakage

Impact: URLs containing session tokens or personal identifiers stored in analytics → potential re-identification

Likelihood: Medium (some publishers may use URL params for user IDs)

Mitigation: (1) Publisher documentation warns against passing PII in URLs, (2) Future enhancement: automatic query param stripping for common patterns (?user_id=, ?email=)

✅ LOW RISK: Third-Party Sub-Processor Access

Impact: Sub-processors (Neon, Railway, Stripe) access pseudonymized data

Likelihood: Inherent (sub-processors required for service delivery)

Mitigation: DPAs with all processors, EU data residency (Neon Germany/Frankfurt, Railway Netherlands), SCCs for US transfers, enterprise-grade certified vendors only

3.2 Overall Risk Rating

🟡 MEDIUM RISK — (elevated from LOW due to email delivery via US sub-processor and Google OAuth data flows)

No high-risk automated decision-making (Art. 22), no special category data (Art. 9), no large-scale monitoring of publicly accessible areas, no vulnerable data subjects. Risk elevation is due to transactional email processing by Resend, Inc. (USA) and Google OAuth authentication flows, both covered by SCCs.

4. Safeguards & Measures

4.1 Privacy by Design (GDPR Art. 25)

  • Pseudonymization: IP addresses hashed with SHA-256 + daily salt before storage (Art. 32(1)(a))
  • Data Minimization: User-Agent truncated to 200 chars (browser family only, not full fingerprint)
  • Storage Limitation: Automatic deletion after retention period (Free: 7d, Pro: 30d, Business: 90d, Enterprise: 365d)
  • Purpose Limitation: Data used only for bot detection, not sold or used for marketing
  • Transparency: Processing disclosed in publisher privacy policies + HumanKey privacy policy

4.2 Security Measures (GDPR Art. 32)

MeasureImplementation
Encryption in transitTLS 1.3, HSTS preload, certificate pinning
Encryption at restAES-256 (Neon PostgreSQL default)
Access controlRow-level security (RLS), JWT auth, bcrypt password hashing (12 rounds)
Rate limitingRedis-backed, mandatory in production (200 req/min/IP for detection endpoint)
MonitoringSentry error tracking (PII stripped), structured logging (Pino), admin audit log
BackupsDaily automated backups (Neon), 30-day retention, encrypted

4.3 Data Subject Rights (GDPR Art. 15-22)

RightImplementation
Right to Access (Art. 15)Dashboard → Settings → Export Data (JSON download)
Right to Erasure (Art. 17)Dashboard → Settings → Delete Account (cascading deletion + audit log)
Right to Rectification (Art. 16)Dashboard → Settings → Update Profile (name, email)
Right to Object (Art. 21)Contact form (Privacy & GDPR dept., manual review, 30-day response)
Right to Data Portability (Art. 20)Export includes all visit data in machine-readable JSON format

4.4 Sub-Processors (GDPR Art. 28)

Sub-ProcessorPurposeTransfer Basis
Neon Inc.PostgreSQL database hosting (primary data store)EU — Frankfurt, Germany
Railway Corp.API server hosting and infrastructureEU — Netherlands
Vercel Inc.Frontend hosting and CDNUSA — SCCs applied
Sentry (Functional Software Inc.)Error monitoring and performance tracking (PII-stripped)USA — SCCs applied
Resend, Inc.Transactional email delivery (verification, notifications)USA — SCCs applied
Stripe, Inc.Payment processing (acts as independent controller for card data)USA — SCCs + DPF

Additional Processing Activities

Authentication & Account Data

Data processed: Email address, hashed password (bcrypt), Google OAuth user ID and email, email verification tokens, magic link tokens.
Legal basis: Art. 6(1)(b) GDPR — processing necessary for the performance of a contract.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of active account; deleted within 30 days of account closure.

Email Communications via Resend

Data processed: Email address, email content (verification codes, password reset links, onboarding notifications).
Legal basis: Art. 6(1)(b) GDPR — necessary for account functionality and service delivery.
Sub-processor: Resend, Inc. (USA) — EU Standard Contractual Clauses applied.
Retention: Email delivery logs retained 90 days by Resend; email addresses stored in HumanKey DB for the account lifetime.

Team Member Data (Business tier)

Data processed: Email addresses of invited team members, invitation tokens, team role assignments.
Legal basis: Art. 6(1)(b) GDPR — necessary for multi-user account management.
Stored in: Neon DB (Frankfurt, Germany — EU).
Retention: Duration of team membership; deleted upon team removal or account closure.

Session Recording Data (Beta)

Data processed: If enabled by the Controller, records click events and scroll interactions on the Controller's website pages. No keystroke data is captured. Session recordings are stored only for sites where the Controller explicitly enables this feature.
Legal basis: Art. 6(1)(f) GDPR — legitimate interest of the Controller in understanding user experience, subject to user opt-in mechanism.
Note: This is a Beta feature, currently disabled by default. Controllers bear responsibility for informing their own users of session recording per Art. 13/14 GDPR.
Retention: 30 days (Pro), 90 days (Business), 365 days (Enterprise).

5. Consultation & Approval

5.1 Data Protection Officer (DPO)

No DPO appointed (not required — Art. 37 thresholds not met). Privacy queries handled by: our contact form

5.2 Supervisory Authority

Polish supervisory authority: Urząd Ochrony Danych Osobowych (UODO)
Website: uodo.gov.pl

5.3 Data Subject Views

This DPIA is publicly available at humankey.io/legal/dpia per Art. 35(9). We invite feedback from data subjects, civil society organizations, and privacy advocates. Contact: our contact form

6. Review & Maintenance

Initial Version1.0 (February 2026) → 1.1 (March 2026)
Review FrequencyAnnual (every February) or when material changes occur
Next Review DateFebruary 2027
Responsible PersonTymoteusz (Founder, Data Controller)

Triggers for Re-Assessment: New data types collected, change in legal basis, data breach, supervisory authority inquiry, technology upgrade affecting privacy, or complaints from data subjects.

7. Conclusion

✅ DPIA OUTCOME: Processing is GDPR-compliant and may proceed.

  • Risk to data subjects: LOW
  • Legitimate interest: JUSTIFIED (content protection, traffic analytics)
  • Safeguards: ADEQUATE (pseudonymization, encryption, data minimization, retention limits)
  • Rights compliance: IMPLEMENTED (access, erasure, portability, rectification)
  • Transparency: ACHIEVED (privacy policy, DPIA published, sub-processors disclosed)

Contact: Questions or objections regarding this DPIA? Email our contact form or write to the Polish supervisory authority (UODO).

This DPIA complies with GDPR Article 35, ICO guidelines, and WP29 Guidelines on DPIAs (wp248rev.01). Version 1.1 approved by Data Controller: Tymoteusz (March 2026).