Skip to Content
AuditsphereAuditSphere - Architecture Overview

AuditSphere - Architecture Overview

Executive Summary

AuditSphere is a multi-cloud metadata aggregation and compliance platform that provides unified security monitoring across Microsoft 365, Google Workspace, Box, Dropbox, and other enterprise data repositories. The platform collects only metadata (never document content) and provides real-time audit monitoring, ML-powered anomaly detection, multi-framework compliance assessments, data classification, data retention management, and access review workflows.


System Architecture

High-Level Architecture Diagram


Component Overview

1. Frontend Layer

Technology Stack:

  • Modern React-based web framework with server-side rendering
  • Component-based UI architecture with reusable design system
  • Utility-first CSS framework for responsive styling
  • Client-side state management with data fetching optimization
  • Chart visualization library for metrics and dashboards

Key Characteristics:

  • Single Page Application (SPA) with server-side rendering for SEO and performance
  • Responsive design supporting desktop and tablet viewports
  • Real-time data updates with optimistic UI patterns
  • Accessible UI components following WCAG guidelines

2. Backend Layer

Technology Stack:

  • Full-stack web framework with API routes and server-side rendering
  • Type-safe API layer with automatic client/server type synchronization
  • Database ORM with type-safe query building
  • Data validation library for request/response validation
  • Structured logging framework for observability

Key Characteristics:

  • Serverless-compatible architecture for cloud deployment
  • Type-safe end-to-end development from database to UI
  • Middleware-based request processing pipeline
  • Background job processing for scheduled tasks

3. Data Layer

Primary Database:

  • PostgreSQL relational database (cloud-hosted)
  • Schema migrations with version control
  • Encrypted storage for sensitive credentials

Token Management:

  • In-memory caching of Microsoft OAuth tokens to reduce API calls
  • Automatic token refresh before expiration

4. Machine Learning Service

Technology Stack:

  • Python-based ML service with REST API
  • Isolation Forest algorithm for anomaly detection
  • Automatic model retraining from operational data

Key Characteristics:

  • Separate microservice for ML workloads
  • Feature engineering from audit event data
  • Configurable detection thresholds and sensitivity

5. External Integrations

Microsoft 365 Integration:

  • Microsoft Graph API for SharePoint/OneDrive access
  • Office 365 Management API for audit log collection
  • Webhook subscriptions for real-time event notifications

Authentication:

  • OAuth 2.0 / OpenID Connect identity provider
  • JWT-based session management
  • Multi-tenant support

Data Flow Architecture

Audit Event Collection Flow

How it works:

  1. Microsoft 365 sends an activity event (e.g., “John downloaded a file”) to our webhook endpoint
  2. Webhook Handler receives and parses the raw event into a structured format
  3. Event Processor does two things:
    • Stores the audit record in the database for historical tracking
    • Extracts event features (user, time, action type) and sends to ML Detection
  4. ML Detection analyzes the event for anomalies (unusual patterns, suspicious behavior)
  5. Anomaly Alerts are generated if something suspicious is detected

Access Review Workflow

How it works:

  1. Create Campaign - Admin creates a review campaign by configuring which sites and users to review
  2. Collect Permissions - System queries Microsoft Graph API to fetch all current permissions (who has access to what)
  3. Review Decisions - Reviewer goes through each permission item in the UI and decides to approve (keep) or revoke (remove)
  4. Execute Removals - System calls Graph API to remove access for all revoked permissions

Security Architecture

Authentication Flow

Security Controls

LayerControlDescription
AuthenticationOAuth 2.0 + PKCESecure user authentication via identity provider
SessionJWT TokensStateless session management with secure cookies
APIProtected ProceduresMiddleware-enforced authentication checks
DataEncryption at RestMicrosoft tokens encrypted in database
TransportTLS 1.3All communications encrypted in transit

Database Schema Overview

Entity Relationship Diagram

How to read this diagram:

The lines show relationships between database tables. ||--o{ means “one-to-many” (one record on the left relates to many records on the right).

RelationshipMeaning
USER → AUDIT_EVENTOne user views many audit events from their connected Microsoft 365 tenant
USER → ALERTOne user has many alerts generated from detected anomalies
USER → ACCESS_REVIEWOne user creates many access review campaigns
AUDIT_EVENT → ANOMALYOne audit event can trigger anomaly detection (e.g., bulk download flags suspicious activity)
ANOMALY → ALERTOne anomaly creates an alert to notify the user
ACCESS_REVIEW → PERMISSION_ITEMOne review campaign contains many permission items to review
PERMISSION_ITEM → REVIEW_DECISIONEach permission item has a decision (approve or revoke)
COMPLIANCE_RUN → COMPLIANCE_CHECKOne compliance run includes many individual checks (e.g., CIS benchmark checks)

Core Entity Groups

User Management:

  • Admin users with full access to all features
  • Microsoft 365 connection credentials (encrypted)
  • User preferences and notification settings

Audit & Monitoring:

  • Audit events from Microsoft 365
  • Anomaly detection results
  • Security alerts and notifications

Compliance:

  • Compliance check definitions and results
  • Compliance run history with aggregated scores

Access Review:

  • Review campaigns with configurable scope
  • Permission items with review decisions
  • Scheduled recurring reviews
  • Designated resource owners

Reporting:

  • Generated reports with file storage
  • Scheduled report configurations

Deployment Architecture

Cloud Infrastructure

Scalability Considerations

  • Horizontal Scaling: Serverless architecture auto-scales with demand
  • Database Scaling: Connection pooling and read replicas supported
  • Background Processing: Scheduled jobs run independently of user requests

Multi-Cloud Connector Architecture

AuditSphere uses a connector abstraction layer to normalize metadata from any cloud provider into a unified schema. Each connector implements the same interface — the rest of the platform works provider-agnostically.

Supported Providers

ProviderConnectorAPIs UsedData Collected
Microsoft 365MicrosoftConnectorGraph API, O365 Management APISites, files, permissions, audit events, sensitivity labels
Google WorkspaceGoogleDriveConnectorDrive API v3, Admin Reports APIFiles, permissions, audit events, labels
BoxBoxConnectorBox Platform API, Events APIFiles, collaborations, audit events, classifications
DropboxDropboxConnectorDropbox Business API, Team LogFiles, sharing, audit events

Connector Interface

Every connector implements:

  • listFiles() / getFileMetadata() / getFilePermissions() — metadata collection
  • fetchAuditEvents() — activity log collection
  • getNativeLabels() / getRetentionInfo() — classification and retention
  • refreshTokens() / validateConnection() — authentication

Integration Architecture

Unified Data Model

All providers normalize into the same tables:

Unified TablePurposeKey Fields
cloud_connectionsOAuth connections per providerprovider, accountId, tokens, status
file_metadataFiles and folders from any providerproviderFileId, name, path, mimeType, classification
file_permissionsPermissions on any filegrantedTo, accessLevel, isExternal, sharingLinkScope
audit_activitiesActivity events from any provideroperation, operationCategory, actorEmail, resourceName

Required Permissions by Provider

Microsoft 365:

  • AuditLog.Read.All, Directory.Read.All, Sites.Read.All, User.Read.All, SecurityEvents.Read.All

Google Workspace:

  • drive.metadata.readonly, admin.reports.audit.readonly, admin.directory.user.readonly

Box:

  • Enterprise app with full access (admin events, collaborations, file metadata)

Dropbox:

  • files.metadata.read, sharing.read, events.read, team_data.member

Monitoring & Observability

Application Monitoring

  • Error tracking and alerting via monitoring service
  • Structured logging with correlation IDs
  • Performance metrics and tracing

Health Checks

  • Database connectivity verification
  • External API connectivity status
  • ML service availability

Compliance & Standards

Supported Compliance Frameworks

AuditSphere supports 8 built-in compliance frameworks with declarative, provider-agnostic rules:

FrameworkRulesKey Focus Areas
GDPR4Data retention, classification, access control, audit trails
HIPAA4PHI access control, external sharing, audit trails, retention
SOX3Financial document controls, audit trails, record retention
NIST 800-534Access enforcement, audit events, media sanitization, categorization
ISO 270013Access control, information classification, logging
SOC 23Logical access, system monitoring, confidentiality
CCPA3Consumer data access, classification, retention
CIS MS3654External sharing, anonymous links, guest access, sensitivity labels

Declarative Rule Engine

Rules are stored as JSON configurations in the database, not hardcoded:

  • Permission checks — query file_permissions for anonymous/external sharing violations
  • Retention checks — verify sensitive files have retention policies applied
  • Classification checks — measure unclassified file percentage
  • Audit checks — detect unusual event patterns (e.g., guest permission changes)

Data Classification

Four-level taxonomy: PublicInternalConfidentialRestricted

Provider-native labels (Microsoft Sensitivity Labels, Google Drive Labels, Box Classifications) are mapped to the unified taxonomy.

Data Retention

  • Policy-based retention with configurable days, actions (delete/archive/review)
  • Automated violation detection: expired-not-deleted, missing-retention, expired-not-archived
  • Classification-level filtering (e.g., require retention for confidential + restricted)

Audit Capabilities

  • Complete audit trail across all providers
  • Immutable event logging in unified format
  • Report generation for compliance evidence
  • Cross-provider anomaly detection

Technology Summary

ComponentCategoryPurpose
Web FrameworkFull-stack JavaScriptServer-side rendering, API routes, deployment
UI LibraryComponent FrameworkReusable, accessible UI components
StylingUtility CSSResponsive, maintainable styling
API LayerType-safe RPCEnd-to-end type safety between client and server
Database ORMQuery BuilderType-safe database access with migrations
ValidationSchema LibraryRequest/response validation with type inference
AuthenticationOAuth ProviderSecure user authentication
ML FrameworkPython MLAnomaly detection and model training
MonitoringError TrackingApplication health and error monitoring

Document Information

PropertyValue
Version1.0
Last UpdatedDecember 2025
ClassificationClient Documentation
Last updated on