AuditSphere - Architecture Overview
Executive Summary
AuditSphere is a multi-cloud metadata aggregation and compliance platform that provides unified security monitoring across Microsoft 365, Google Workspace, Box, Dropbox, and other enterprise data repositories. The platform collects only metadata (never document content) and provides real-time audit monitoring, ML-powered anomaly detection, multi-framework compliance assessments, data classification, data retention management, and access review workflows.
System Architecture
High-Level Architecture Diagram
Component Overview
1. Frontend Layer
Technology Stack:
- Modern React-based web framework with server-side rendering
- Component-based UI architecture with reusable design system
- Utility-first CSS framework for responsive styling
- Client-side state management with data fetching optimization
- Chart visualization library for metrics and dashboards
Key Characteristics:
- Single Page Application (SPA) with server-side rendering for SEO and performance
- Responsive design supporting desktop and tablet viewports
- Real-time data updates with optimistic UI patterns
- Accessible UI components following WCAG guidelines
2. Backend Layer
Technology Stack:
- Full-stack web framework with API routes and server-side rendering
- Type-safe API layer with automatic client/server type synchronization
- Database ORM with type-safe query building
- Data validation library for request/response validation
- Structured logging framework for observability
Key Characteristics:
- Serverless-compatible architecture for cloud deployment
- Type-safe end-to-end development from database to UI
- Middleware-based request processing pipeline
- Background job processing for scheduled tasks
3. Data Layer
Primary Database:
- PostgreSQL relational database (cloud-hosted)
- Schema migrations with version control
- Encrypted storage for sensitive credentials
Token Management:
- In-memory caching of Microsoft OAuth tokens to reduce API calls
- Automatic token refresh before expiration
4. Machine Learning Service
Technology Stack:
- Python-based ML service with REST API
- Isolation Forest algorithm for anomaly detection
- Automatic model retraining from operational data
Key Characteristics:
- Separate microservice for ML workloads
- Feature engineering from audit event data
- Configurable detection thresholds and sensitivity
5. External Integrations
Microsoft 365 Integration:
- Microsoft Graph API for SharePoint/OneDrive access
- Office 365 Management API for audit log collection
- Webhook subscriptions for real-time event notifications
Authentication:
- OAuth 2.0 / OpenID Connect identity provider
- JWT-based session management
- Multi-tenant support
Data Flow Architecture
Audit Event Collection Flow
How it works:
- Microsoft 365 sends an activity event (e.g., “John downloaded a file”) to our webhook endpoint
- Webhook Handler receives and parses the raw event into a structured format
- Event Processor does two things:
- Stores the audit record in the database for historical tracking
- Extracts event features (user, time, action type) and sends to ML Detection
- ML Detection analyzes the event for anomalies (unusual patterns, suspicious behavior)
- Anomaly Alerts are generated if something suspicious is detected
Access Review Workflow
How it works:
- Create Campaign - Admin creates a review campaign by configuring which sites and users to review
- Collect Permissions - System queries Microsoft Graph API to fetch all current permissions (who has access to what)
- Review Decisions - Reviewer goes through each permission item in the UI and decides to approve (keep) or revoke (remove)
- Execute Removals - System calls Graph API to remove access for all revoked permissions
Security Architecture
Authentication Flow
Security Controls
| Layer | Control | Description |
|---|---|---|
| Authentication | OAuth 2.0 + PKCE | Secure user authentication via identity provider |
| Session | JWT Tokens | Stateless session management with secure cookies |
| API | Protected Procedures | Middleware-enforced authentication checks |
| Data | Encryption at Rest | Microsoft tokens encrypted in database |
| Transport | TLS 1.3 | All communications encrypted in transit |
Database Schema Overview
Entity Relationship Diagram
How to read this diagram:
The lines show relationships between database tables. ||--o{ means “one-to-many” (one record on the left relates to many records on the right).
| Relationship | Meaning |
|---|---|
| USER → AUDIT_EVENT | One user views many audit events from their connected Microsoft 365 tenant |
| USER → ALERT | One user has many alerts generated from detected anomalies |
| USER → ACCESS_REVIEW | One user creates many access review campaigns |
| AUDIT_EVENT → ANOMALY | One audit event can trigger anomaly detection (e.g., bulk download flags suspicious activity) |
| ANOMALY → ALERT | One anomaly creates an alert to notify the user |
| ACCESS_REVIEW → PERMISSION_ITEM | One review campaign contains many permission items to review |
| PERMISSION_ITEM → REVIEW_DECISION | Each permission item has a decision (approve or revoke) |
| COMPLIANCE_RUN → COMPLIANCE_CHECK | One compliance run includes many individual checks (e.g., CIS benchmark checks) |
Core Entity Groups
User Management:
- Admin users with full access to all features
- Microsoft 365 connection credentials (encrypted)
- User preferences and notification settings
Audit & Monitoring:
- Audit events from Microsoft 365
- Anomaly detection results
- Security alerts and notifications
Compliance:
- Compliance check definitions and results
- Compliance run history with aggregated scores
Access Review:
- Review campaigns with configurable scope
- Permission items with review decisions
- Scheduled recurring reviews
- Designated resource owners
Reporting:
- Generated reports with file storage
- Scheduled report configurations
Deployment Architecture
Cloud Infrastructure
Scalability Considerations
- Horizontal Scaling: Serverless architecture auto-scales with demand
- Database Scaling: Connection pooling and read replicas supported
- Background Processing: Scheduled jobs run independently of user requests
Multi-Cloud Connector Architecture
AuditSphere uses a connector abstraction layer to normalize metadata from any cloud provider into a unified schema. Each connector implements the same interface — the rest of the platform works provider-agnostically.
Supported Providers
| Provider | Connector | APIs Used | Data Collected |
|---|---|---|---|
| Microsoft 365 | MicrosoftConnector | Graph API, O365 Management API | Sites, files, permissions, audit events, sensitivity labels |
| Google Workspace | GoogleDriveConnector | Drive API v3, Admin Reports API | Files, permissions, audit events, labels |
| Box | BoxConnector | Box Platform API, Events API | Files, collaborations, audit events, classifications |
| Dropbox | DropboxConnector | Dropbox Business API, Team Log | Files, sharing, audit events |
Connector Interface
Every connector implements:
listFiles()/getFileMetadata()/getFilePermissions()— metadata collectionfetchAuditEvents()— activity log collectiongetNativeLabels()/getRetentionInfo()— classification and retentionrefreshTokens()/validateConnection()— authentication
Integration Architecture
Unified Data Model
All providers normalize into the same tables:
| Unified Table | Purpose | Key Fields |
|---|---|---|
cloud_connections | OAuth connections per provider | provider, accountId, tokens, status |
file_metadata | Files and folders from any provider | providerFileId, name, path, mimeType, classification |
file_permissions | Permissions on any file | grantedTo, accessLevel, isExternal, sharingLinkScope |
audit_activities | Activity events from any provider | operation, operationCategory, actorEmail, resourceName |
Required Permissions by Provider
Microsoft 365:
AuditLog.Read.All,Directory.Read.All,Sites.Read.All,User.Read.All,SecurityEvents.Read.All
Google Workspace:
drive.metadata.readonly,admin.reports.audit.readonly,admin.directory.user.readonly
Box:
- Enterprise app with full access (admin events, collaborations, file metadata)
Dropbox:
files.metadata.read,sharing.read,events.read,team_data.member
Monitoring & Observability
Application Monitoring
- Error tracking and alerting via monitoring service
- Structured logging with correlation IDs
- Performance metrics and tracing
Health Checks
- Database connectivity verification
- External API connectivity status
- ML service availability
Compliance & Standards
Supported Compliance Frameworks
AuditSphere supports 8 built-in compliance frameworks with declarative, provider-agnostic rules:
| Framework | Rules | Key Focus Areas |
|---|---|---|
| GDPR | 4 | Data retention, classification, access control, audit trails |
| HIPAA | 4 | PHI access control, external sharing, audit trails, retention |
| SOX | 3 | Financial document controls, audit trails, record retention |
| NIST 800-53 | 4 | Access enforcement, audit events, media sanitization, categorization |
| ISO 27001 | 3 | Access control, information classification, logging |
| SOC 2 | 3 | Logical access, system monitoring, confidentiality |
| CCPA | 3 | Consumer data access, classification, retention |
| CIS MS365 | 4 | External sharing, anonymous links, guest access, sensitivity labels |
Declarative Rule Engine
Rules are stored as JSON configurations in the database, not hardcoded:
- Permission checks — query file_permissions for anonymous/external sharing violations
- Retention checks — verify sensitive files have retention policies applied
- Classification checks — measure unclassified file percentage
- Audit checks — detect unusual event patterns (e.g., guest permission changes)
Data Classification
Four-level taxonomy: Public → Internal → Confidential → Restricted
Provider-native labels (Microsoft Sensitivity Labels, Google Drive Labels, Box Classifications) are mapped to the unified taxonomy.
Data Retention
- Policy-based retention with configurable days, actions (delete/archive/review)
- Automated violation detection: expired-not-deleted, missing-retention, expired-not-archived
- Classification-level filtering (e.g., require retention for confidential + restricted)
Audit Capabilities
- Complete audit trail across all providers
- Immutable event logging in unified format
- Report generation for compliance evidence
- Cross-provider anomaly detection
Technology Summary
| Component | Category | Purpose |
|---|---|---|
| Web Framework | Full-stack JavaScript | Server-side rendering, API routes, deployment |
| UI Library | Component Framework | Reusable, accessible UI components |
| Styling | Utility CSS | Responsive, maintainable styling |
| API Layer | Type-safe RPC | End-to-end type safety between client and server |
| Database ORM | Query Builder | Type-safe database access with migrations |
| Validation | Schema Library | Request/response validation with type inference |
| Authentication | OAuth Provider | Secure user authentication |
| ML Framework | Python ML | Anomaly detection and model training |
| Monitoring | Error Tracking | Application health and error monitoring |
Document Information
| Property | Value |
|---|---|
| Version | 1.0 |
| Last Updated | December 2025 |
| Classification | Client Documentation |