Skip to Content
Auditsphere MlAuditSphere ML Service - Overview

AuditSphere ML Service - Overview

Introduction

The AuditSphere ML Service is a machine learning-powered anomaly detection microservice that analyzes SharePoint and OneDrive audit events in real-time. It uses the Isolation Forest algorithm to learn what “normal” activity looks like for your organization and flags anything that deviates from established patterns.


Architecture

How it works:

ConnectionPurpose
Cron → ML ServiceScheduled job triggers anomaly detection every 15 minutes
API → ML ServiceSends batch of audit events for analysis
ML Service → DatabaseFetches historical events for model training
ML Service → APIReturns anomaly detection results

What It Detects

The ML service identifies six categories of suspicious activity:

1. Unusual Timing

Activity occurring outside normal business hours for a user.

IndicatorExample
Off-hours accessEmployee downloading files at 2 AM
Weekend activityFile access when user has no weekend history
Holiday accessOperations during company holidays

2. Bulk File Operations

Unusually large numbers of file operations in a short time.

IndicatorExample
Mass downloads500 files downloaded in 10 minutes
Bulk deletionsRapid deletion from shared folders
Data stagingCopying files to personal folders

3. External & Guest Access

Suspicious activity from users outside the organization.

IndicatorExample
Guest accessGuest user accessing HR documents
External contractorContractor downloading from restricted areas
External markersUsers with #EXT# accessing sensitive data

4. Suspicious Location

Access from unexpected geographic locations or devices.

IndicatorExample
Location changeNY user suddenly accessing from Eastern Europe
Impossible travelMultiple countries within hours
New deviceUnknown device accessing sensitive files

5. Sensitive Operations

High-risk actions that could expose data.

IndicatorExample
Public sharingCreating anonymous links for confidential files
Admin changesAdding new site collection administrators
Permission changesBroadening access to restricted folders

6. Unusual Volume

Activity levels far exceeding normal patterns.

IndicatorExample
Spike in access5 files/day user suddenly accessing 200
Inactive user surgeRarely active user performing hundreds of operations

Technology Stack

ComponentTechnologyPurpose
FrameworkFastAPIHigh-performance async API server
ML AlgorithmIsolation ForestUnsupervised anomaly detection
ML Libraryscikit-learnModel training and inference
RuntimePython 3.12+Service execution
DeploymentRailwayCloud hosting

API Endpoints

EndpointMethodDescription
/GETService status
/healthGETHealth check
/api/v1/anomaly/detectPOSTBatch anomaly detection
/api/v1/anomaly/scorePOSTSingle event scoring
/api/v1/anomaly/trainPOSTRetrain model
/api/v1/anomaly/model/statusGETModel status

How Detection Works

How it works:

  1. Audit Event - Raw event arrives (user, operation, timestamp, file, IP, etc.)
  2. Feature Extraction - Transforms event into 16 numeric features (hour, day, guest flag, bulk operation, etc.)
  3. Isolation Forest - ML model scores how “isolated” (unusual) the event is compared to training data
  4. Detection Result - Returns anomaly flag, confidence score, and anomaly type (unusual_timing, bulk_operation, etc.)

Feature Extraction

The model analyzes 16 features from each audit event:

Temporal Features

  • Hour of day
  • Day of week
  • Weekend flag
  • Business hours flag

User Behavior

  • Event count (1h window)
  • Event count (24h window)
  • Unique sites accessed
  • Unique operations performed

Access Patterns

  • Guest user flag
  • External access flag
  • New IP address
  • Unusual location

Operation Context

  • Operation category
  • Sensitive operation flag
  • File type category
  • Bulk operation flag

Scoring

Each detected anomaly includes:

FieldDescription
anomaly_scoreHow unusual the activity is (0-1, higher = more suspicious)
confidenceModel certainty about the detection
anomaly_typeCategory of suspicious behavior detected
contributing_factorsSpecific reasons for flagging

Model Training

Auto-Training

The service supports automatic training from the AuditSphere database:

  1. On Startup: If model is untrained, fetches up to 2000 recent events and trains automatically
  2. On Detection: If detection is requested with untrained model, trains first then detects

Training Requirements

RequirementValue
Minimum events100
Recommended events1000+
Data freshnessRecent activity preferred

Configuration

VariableDescriptionDefault
PORTServer port8000
DATABASE_URLPostgreSQL connection for auto-training-
AUTO_TRAIN_ON_STARTUPEnable auto-trainingtrue
MIN_TRAINING_EVENTSMinimum events for training100
CONTAMINATIONExpected anomaly proportion0.10
N_ESTIMATORSNumber of trees in forest200
BUSINESS_HOURS_STARTBusiness hours start9
BUSINESS_HOURS_ENDBusiness hours end18

Integration with AuditSphere

Detection Flow

  1. Cron Job triggers every 15 minutes
  2. AuditSphere API fetches unprocessed audit events
  3. ML Service receives batch of events
  4. Feature extraction transforms events into 16 numeric features
  5. Isolation Forest scores each event
  6. Results returned with anomaly flags and scores
  7. Anomalies stored in database for user review

API Usage Examples

Detect Anomalies

curl -X POST https://ml-service.example.com/api/v1/anomaly/detect \ -H "Content-Type: application/json" \ -d '{ "events": [ { "event_id": "evt_123", "operation": "FileDownloaded", "user_id": "user@company.com", "creation_time": "2025-01-15T02:30:00Z", "site_url": "https://company.sharepoint.com/sites/hr", "source_file_name": "salaries.xlsx", "client_ip": "203.0.113.50", "user_type": 0, "event_count_1h": 5, "event_count_24h": 50 } ] }'

Response

{ "results": [ { "event_id": "evt_123", "is_anomaly": true, "anomaly_score": 0.85, "confidence": 0.92, "anomaly_type": "unusual_timing" } ], "total_events": 1, "anomalies_detected": 1, "processing_time_ms": 45 }

Check Model Status

curl https://ml-service.example.com/api/v1/anomaly/model/status
{ "model_loaded": true, "training_samples": 2000, "contamination": 0.1, "last_trained": "2025-12-15T10:30:00" }

Deployment

The ML service is deployed separately from the main AuditSphere application:

PlatformConfiguration
RailwayAuto-detected Python deployment
DockerAvailable via Dockerfile
Manualuvicorn app.main:app --port 8000

Environment Setup

# Required DATABASE_URL=postgresql://user:pass@host/db?sslmode=require # Optional PORT=8000 AUTO_TRAIN_ON_STARTUP=true CONTAMINATION=0.10 N_ESTIMATORS=200

Document Information

PropertyValue
Version1.0
Last UpdatedDecember 2025
ClassificationClient Documentation
Last updated on