Steps to reproduce
- Run any configuration.
The issue happens randomly from time to time. So far I managed to reproduce it on Linux with tasks and services on VM-based backends after provisioning a new instance (not reusing an idle one).
Sample configuration:
type: task
commands:
- pip install torch
Actual behaviour
The configuration runs but CLI cannot attach to it.
> dstack apply -f tasks/pytorch.dstack.yml --spot -b aws -y
Configuration tasks/pytorch.dstack.yml
Project main
User admin
Pool default-pool
Min resources 2..xCPU, 8GB.., 100GB.. (disk)
Max price -
Max duration 72h
Spot policy spot
Retry policy no
Creation policy reuse-or-create
Termination policy destroy-after-idle
Termination idle time 5m
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 aws eu-north-1 m5.large 2xCPU, 8GB, 100.0GB (disk) yes $0.0333
2 aws us-west-2 m5.large 2xCPU, 8GB, 100.0GB (disk) yes $0.0334
3 aws us-east-1 m5.large 2xCPU, 8GB, 100.0GB (disk) yes $0.0337
...
Shown 3 of 390 offers, $50.824 max
ordinary-earwig-1 provisioning completed (running)
Traceback (most recent call last):
File "/home/jvstme/git/dstack/dstack/venv/bin/dstack", line 8, in <module>
sys.exit(main())
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/main.py", line 77, in main
args.func(args)
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/commands/apply.py", line 70, in _command
configurator.apply_configuration(
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/services/configurators/run.py", line 147, in apply_configuration
if run.attach(bind_address=bind_address):
File "/home/jvstme/git/dstack/dstack/src/dstack/api/_public/runs.py", line 255, in attach
ports_lock = SSHAttach.reuse_ports_lock(run_name=self.name)
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/services/ssh/attach.py", line 50, in reuse_ports_lock
output = subprocess.check_output(
File "/usr/lib64/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib64/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['grep', '--', '-S /home/jvstme/.dstack/ssh/ordinary-earwig-1.control.sock']' returned non-zero exit status 1.
Attaching with dstack logs may also fail with the same error.
> dstack logs -a ordinary-earwig-1
Traceback (most recent call last):
File "/home/jvstme/git/dstack/dstack/venv/bin/dstack", line 8, in <module>
sys.exit(main())
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/main.py", line 77, in main
args.func(args)
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/commands/logs.py", line 52, in _command
run.attach(args.ssh_identity_file)
File "/home/jvstme/git/dstack/dstack/src/dstack/api/_public/runs.py", line 255, in attach
ports_lock = SSHAttach.reuse_ports_lock(run_name=self.name)
File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/services/ssh/attach.py", line 50, in reuse_ports_lock
output = subprocess.check_output(
File "/usr/lib64/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib64/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['grep', '--', '-S /home/jvstme/.dstack/ssh/ordinary-earwig-1.control.sock']' returned non-zero exit status 1.
After a few attempts dstack logs may or may not succeed in attaching.
Expected behaviour
CLI attaches to the run on the first try.
dstack version
0.18.13
Steps to reproduce
The issue happens randomly from time to time. So far I managed to reproduce it on Linux with tasks and services on VM-based backends after provisioning a new instance (not reusing an idle one).
Sample configuration:
Actual behaviour
The configuration runs but CLI cannot attach to it.
Attaching with
dstack logsmay also fail with the same error.After a few attempts
dstack logsmay or may not succeed in attaching.Expected behaviour
CLI attaches to the run on the first try.
dstack version
0.18.13