Skip to content

S3 storage service overview

This page provides an overview of the S3-compatible storage service available on the platform
and explains when and how it should be used.

The storage service is designed to support biomedical and clinical research,
with particular attention to security, scalability, and data governance.


What this storage service is

The platform provides a geodistributed, S3-compatible object storage system.

Key characteristics:

  • Object storage (not a traditional filesystem)
  • S3 API compatibility
  • Geodistributed across multiple university sites
  • Designed for large datasets and persistent storage
  • Suitable for collaboration and data sharing within approved projects

This storage service complements, but does not replace, the HPC filesystems
(HOME, WORK, SCRATCH).


Typical use cases

The S3 storage service is suitable for:

  • Storage of research datasets
  • Biomedical and clinical data (subject to project approval)
  • Large dataset transfers (hundreds of GB to TB scale)
  • Data sharing within approved research projects
  • Storage of results beyond HPC job execution

It is not intended for:

  • High-frequency small I/O operations
  • Temporary files during job execution
  • Replacing SCRATCH or WORK filesystems

Object storage vs. filesystem storage

It is important to understand the difference:

Object storage (S3)

  • Data is stored as objects inside buckets
  • No directories in the traditional sense
  • Accessed via APIs or dedicated tools
  • Optimized for scalability and durability

HPC filesystems

  • Traditional POSIX filesystems
  • Optimized for high-performance parallel I/O
  • Used directly by compute jobs

In practice:

  • Use HPC filesystems for computation
  • Use S3 storage for dataset storage and controlled data sharing

Buckets and projects

Access to S3 storage is project-based.

  • Each project is assigned one bucket
  • Permissions are restricted to authorized users
  • Buckets are logically isolated from each other

Users can only access the buckets explicitly assigned to their project.


Security and compliance

The storage service is operated in accordance with:

  • GDPR requirements
  • Institutional data protection policies
  • ISO/IEC 27001-aligned information security practices

Security measures include:

  • Access control based on identity and project authorization
  • Logical isolation between projects
  • Logging of relevant access events
  • Geodistribution for resilience and availability

Users are responsible for handling data in accordance with project approvals
and applicable requirements.


Data lifecycle and retention

Data stored in S3 is persistent for the duration of the authorized project.

  • Access to storage is granted for a defined period
  • At the end of the authorization period, access may be revoked
  • Data may be removed unless the authorization is renewed or extended

Users must request any extension of storage usage before the expiration of the authorization period.

Do not assume indefinite storage without prior agreement.


Performance considerations

S3 storage is optimized for:

  • Large, sequential data transfers (few large files rather than many small files)
  • High-throughput data movement

For best performance:

  • Transfer large files rather than many small files
  • Use multipart uploads for large datasets
  • Avoid frequent overwrites of the same objects

How users access S3 storage

Users do not access S3 storage via a mounted filesystem.

Instead, access is provided through:

  • Command-line tools (e.g. rclone, s3cmd, mc)
  • Programmatic access via S3 APIs
  • Controlled integrations with the HPC cluster

Details are provided in First steps with S3 storage**.


When to combine S3 storage and HPC

A common pattern is:

  1. Store raw datasets in S3 storage
  2. Stage required data to HPC WORK or SCRATCH
  3. Run computations on the HPC cluster
  4. Store final results back to S3

This approach balances performance and data persistence.


Next steps