First steps with S3 storage¶

This tutorial guides you through the first practical steps to access and use the S3-compatible storage service.

You will learn how to:

Configure an S3 client
Access your project bucket
Upload and download data
Use S3 storage together with the HPC cluster

Prerequisites¶

Before starting, make sure you have:

An approved project with S3 storage enabled
Valid S3 credentials (Access Key ID and Secret Access Key) provided by the administrators
The S3 endpoint provided by the administrators
The name of your assigned bucket

Recommended tool: rclone¶

The recommended tool for interacting with S3 storage is rclone.

Command-line usage on Linux and macOS
GUI alternatives:
Commander One (macOS)
S3 Browser (Windows)

rclone is:

Widely used in research environments
Actively maintained
Compatible with S3 APIs
Available on most Linux systems and on the HPC cluster

Step 1 — Configure rclone¶

Start the interactive configuration:

rclone config

Choose:

New remote
Name it (e.g. unibo-s3)
Storage type: s3

When prompted, select:

S3 provider: Other
Access Key ID: (provided by administrators)
Secret Access Key: (provided by administrators)
Endpoint: (provided by administrators)
Region: leave empty unless instructed otherwise

Save the configuration.

The configuration file is stored in:

~/.config/rclone/rclone.conf

Ensure this file is readable only by you.

Accessing your bucket¶

Access to S3 storage is restricted to the buckets assigned to your project.

You must specify the bucket name explicitly in all commands.

Example:

unibo-s3:<your-bucket-name>

Listing all buckets may not be permitted.

Step 2 — Browse bucket contents¶

List the contents of your bucket:

rclone ls unibo-s3:<your-bucket-name>

Step 3 — Upload data to S3¶

Upload a file:

rclone copy local_file.dat unibo-s3:<your-bucket-name>/

Upload a directory:

rclone copy local_directory/ unibo-s3:<your-bucket-name>/data/

rclone transfers only new or modified files.

Step 4 — Download data from S3¶

Download a file:

rclone copy unibo-s3:<your-bucket-name>/results.dat .

Download a directory:

rclone copy unibo-s3:<your-bucket-name>/data/ local_data/

Step 5 — Synchronization (use with care)¶

To synchronize a local directory with a bucket:

rclone sync local_directory/ unibo-s3:<your-bucket-name>/data/

⚠️ Warning: sync deletes files on the destination that do not exist on the source.
Use it only if you fully understand the implications.

Using S3 storage with the HPC cluster¶

A common workflow is:

Store raw datasets in S3
Copy required data from S3 to $WORK/<project_name>
Run HPC jobs
Copy final results back to S3

Example (from the HPC login node):

rclone copy unibo-s3:<your-bucket-name>/input/ $WORK/<project_name>/input/

After job completion:

rclone copy $WORK/<project_name>/results/ unibo-s3:<your-bucket-name>/results/

Performance tips¶

For best performance:

Transfer large files rather than many small files
Use --progress to monitor transfers
Use multipart uploads for large datasets
Avoid frequent overwrites of the same objects

Example:

rclone copy large_file.dat unibo-s3:<your-bucket-name>/ --progress

Security reminders¶

Never share S3 credentials
Do not store credentials in scripts or notebooks
Ensure data usage is consistent with project authorization
Remove temporary local copies when no longer needed

Troubleshooting¶

Access denied¶

Check:

Correct bucket name
Valid credentials
Project authorization

Slow transfers¶

Check:

Network connectivity
File size and number
Use of appropriate rclone options

If problems persist, contact support with error messages.

Next steps¶

Review Security principles for researchers