First steps with S3 storage¶
This tutorial guides you through the first practical steps to access and use the S3-compatible storage service.
You will learn how to:
- Configure an S3 client
- Access your project bucket
- Upload and download data
- Use S3 storage together with the HPC cluster
Prerequisites¶
Before starting, make sure you have:
- An approved project with S3 storage enabled
- Valid S3 credentials (Access Key ID and Secret Access Key) provided by the administrators
- The S3 endpoint provided by the administrators
- The name of your assigned bucket
Recommended tool: rclone¶
The recommended tool for interacting with S3 storage is rclone.
- Command-line usage on Linux and macOS
- GUI alternatives:
- Commander One (macOS)
- S3 Browser (Windows)
rclone is:
- Widely used in research environments
- Actively maintained
- Compatible with S3 APIs
- Available on most Linux systems and on the HPC cluster
Step 1 — Configure rclone¶
Start the interactive configuration:
rclone config
Choose:
- New remote
- Name it (e.g.
unibo-s3) - Storage type:
s3
When prompted, select:
- S3 provider:
Other - Access Key ID: (provided by administrators)
- Secret Access Key: (provided by administrators)
- Endpoint: (provided by administrators)
- Region: leave empty unless instructed otherwise
Save the configuration.
The configuration file is stored in:
~/.config/rclone/rclone.conf
Ensure this file is readable only by you.
Accessing your bucket¶
Access to S3 storage is restricted to the buckets assigned to your project.
You must specify the bucket name explicitly in all commands.
Example:
unibo-s3:<your-bucket-name>
Listing all buckets may not be permitted.
Step 2 — Browse bucket contents¶
List the contents of your bucket:
rclone ls unibo-s3:<your-bucket-name>
Step 3 — Upload data to S3¶
Upload a file:
rclone copy local_file.dat unibo-s3:<your-bucket-name>/
Upload a directory:
rclone copy local_directory/ unibo-s3:<your-bucket-name>/data/
rclone transfers only new or modified files.
Step 4 — Download data from S3¶
Download a file:
rclone copy unibo-s3:<your-bucket-name>/results.dat .
Download a directory:
rclone copy unibo-s3:<your-bucket-name>/data/ local_data/
Step 5 — Synchronization (use with care)¶
To synchronize a local directory with a bucket:
rclone sync local_directory/ unibo-s3:<your-bucket-name>/data/
⚠️ Warning: sync deletes files on the destination that do not exist on the source.
Use it only if you fully understand the implications.
Using S3 storage with the HPC cluster¶
A common workflow is:
- Store raw datasets in S3
- Copy required data from S3 to
$WORK/<project_name> - Run HPC jobs
- Copy final results back to S3
Example (from the HPC login node):
rclone copy unibo-s3:<your-bucket-name>/input/ $WORK/<project_name>/input/
After job completion:
rclone copy $WORK/<project_name>/results/ unibo-s3:<your-bucket-name>/results/
Performance tips¶
For best performance:
- Transfer large files rather than many small files
- Use
--progressto monitor transfers - Use multipart uploads for large datasets
- Avoid frequent overwrites of the same objects
Example:
rclone copy large_file.dat unibo-s3:<your-bucket-name>/ --progress
Security reminders¶
- Never share S3 credentials
- Do not store credentials in scripts or notebooks
- Ensure data usage is consistent with project authorization
- Remove temporary local copies when no longer needed
Troubleshooting¶
Access denied¶
Check:
- Correct bucket name
- Valid credentials
- Project authorization
Slow transfers¶
Check:
- Network connectivity
- File size and number
- Use of appropriate rclone options
If problems persist, contact support with error messages.