Configuration Reference#
Scythe is configured entirely through environment variables, using pydantic-settings for validation and type coercion.
Storage Settings#
Configured via ScytheStorageSettings (prefix: SCYTHE_STORAGE_):
| Variable | Type | Default | Required | Description |
|---|---|---|---|---|
SCYTHE_STORAGE_BUCKET |
str |
-- | Yes | The S3 bucket name for storing experiment data |
SCYTHE_STORAGE_BUCKET_PREFIX |
str |
"scythe" |
No | Key prefix within the bucket for all Scythe data |
All experiment data is stored under s3://<BUCKET>/<BUCKET_PREFIX>/.
Timeout Settings#
Configured via TimeoutSettings (prefix: SCYTHE_TIMEOUT_):
| Variable | Type | Default | Description |
|---|---|---|---|
SCYTHE_TIMEOUT_SCATTER_GATHER_SCHEDULE |
Duration |
1h |
Maximum time for a scatter/gather task to be picked up by a worker |
SCYTHE_TIMEOUT_SCATTER_GATHER_EXECUTION |
Duration |
1h |
Maximum execution time for a scatter/gather task |
SCYTHE_TIMEOUT_EXPERIMENT_SCHEDULE |
Duration |
1h |
Maximum time for an experiment task to be picked up by a worker |
SCYTHE_TIMEOUT_EXPERIMENT_EXECUTION |
Duration |
1m |
Maximum execution time for an experiment task |
Duration values accept Hatchet duration strings like "30s", "5m", "2h", "1d", or Python timedelta objects.
Note
The EXPERIMENT_SCHEDULE and EXPERIMENT_EXECUTION timeouts are defaults used by @ExperimentRegistry.Register(). You can override them per-experiment using the schedule_timeout and execution_timeout parameters of the decorator.
Timeout Guidance#
- Schedule timeouts should be generous -- they cover the time a task waits in the queue. If workers are scaling up from zero, this can take minutes.
- Execution timeouts should be set based on the expected runtime of your longest simulation, with some headroom.
- Scatter/gather timeouts should account for the time to run all child tasks, since the parent waits for all children to complete.
Worker Settings#
Configured via ScytheWorkerConfig (prefix: SCYTHE_WORKER_):
| Variable | Type | Default | Description |
|---|---|---|---|
SCYTHE_WORKER_NAME |
str \| None |
Auto-detected | Custom name for the worker (shown in Hatchet dashboard) |
SCYTHE_WORKER_SLOTS |
int \| None |
CPU count | Number of concurrent task slots |
SCYTHE_WORKER_DURABLE_SLOTS |
int \| None |
1000 |
Number of durable (persistent) task slots |
SCYTHE_WORKER_HIGH_MEMORY |
bool |
False |
Advertise this worker as having high memory |
SCYTHE_WORKER_HIGH_CPU |
bool |
False |
Advertise this worker as having high CPU |
SCYTHE_WORKER_HAS_GPU |
bool |
False |
Advertise this worker as having a GPU |
SCYTHE_WORKER_DOES_FAN |
bool |
True |
Whether this worker handles scatter/gather tasks |
SCYTHE_WORKER_DOES_LEAF |
bool |
True |
Whether this worker handles experiment (leaf) tasks |
Worker Name Auto-Detection#
If SCYTHE_WORKER_NAME is not set, the name is generated based on the hosting environment:
| Environment Variable | Detection | Name Pattern |
|---|---|---|
AWS_BATCH_JOB_ARRAY_INDEX |
AWS Batch | ScytheWorker--AWSBatch<index> |
COPILOT_ENVIRONMENT_NAME |
AWS Copilot | ScytheWorker--AWSCopilot<ENV> |
FLY_REGION |
Fly.io | ScytheWorker--Fly<REGION> |
| (none) | Local | ScytheWorker--Local |
Slot Calculation#
If SCYTHE_WORKER_SLOTS is not set:
- Machines with < 8 CPUs:
slots = cpu_count - Machines with >= 8 CPUs:
slots = cpu_count - 1
Hatchet Settings#
These are standard Hatchet SDK environment variables (not Scythe-specific):
| Variable | Description |
|---|---|
HATCHET_CLIENT_TOKEN |
Authentication token for the Hatchet instance |
HATCHET_CLIENT_TLS_STRATEGY |
TLS strategy (none, tls, mtls) |
See the Hatchet SDK documentation for the full list of client configuration options.
AWS Settings#
Standard AWS SDK environment variables used by boto3 for S3 access:
| Variable | Description |
|---|---|
AWS_REGION |
AWS region for the S3 bucket |
AWS_ACCESS_KEY_ID |
AWS access key |
AWS_SECRET_ACCESS_KEY |
AWS secret key |
AWS_SESSION_TOKEN |
(Optional) Session token for temporary credentials |
AWS_PROFILE |
(Optional) Named profile from ~/.aws/credentials |
Workers running on AWS (ECS, EC2, Lambda) can use IAM roles instead of explicit credentials.
Example .env File#
# Hatchet
HATCHET_CLIENT_TOKEN=eyJhbGciOiJFUzI1NiIs...
# AWS
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
# Storage
SCYTHE_STORAGE_BUCKET=my-research-bucket
SCYTHE_STORAGE_BUCKET_PREFIX=scythe
# Timeouts
SCYTHE_TIMEOUT_SCATTER_GATHER_SCHEDULE=10m
SCYTHE_TIMEOUT_SCATTER_GATHER_EXECUTION=20m
SCYTHE_TIMEOUT_EXPERIMENT_SCHEDULE=10m
SCYTHE_TIMEOUT_EXPERIMENT_EXECUTION=5m
# Worker (optional -- defaults are usually fine for local dev)
# SCYTHE_WORKER_DOES_LEAF=True
# SCYTHE_WORKER_DOES_FAN=True
# SCYTHE_WORKER_SLOTS=4
# SCYTHE_WORKER_HAS_GPU=False