Skip to content

[Bug]: Cannot attach to runs: Command '['grep', ...]' returned non-zero exit status 1. #1689

@jvstme

Description

@jvstme

Steps to reproduce

  1. Run any configuration.

The issue happens randomly from time to time. So far I managed to reproduce it on Linux with tasks and services on VM-based backends after provisioning a new instance (not reusing an idle one).

Sample configuration:

type: task
commands:
  - pip install torch

Actual behaviour

The configuration runs but CLI cannot attach to it.

> dstack apply -f tasks/pytorch.dstack.yml --spot -b aws -y
 Configuration          tasks/pytorch.dstack.yml       
 Project                main                           
 User                   admin                          
 Pool                   default-pool                   
 Min resources          2..xCPU, 8GB.., 100GB.. (disk) 
 Max price              -                              
 Max duration           72h                            
 Spot policy            spot                           
 Retry policy           no                             
 Creation policy        reuse-or-create                
 Termination policy     destroy-after-idle             
 Termination idle time  5m                             

 #  BACKEND  REGION      INSTANCE  RESOURCES                   SPOT  PRICE     
 1  aws      eu-north-1  m5.large  2xCPU, 8GB, 100.0GB (disk)  yes   $0.0333   
 2  aws      us-west-2   m5.large  2xCPU, 8GB, 100.0GB (disk)  yes   $0.0334   
 3  aws      us-east-1   m5.large  2xCPU, 8GB, 100.0GB (disk)  yes   $0.0337   
    ...                                                                        
 Shown 3 of 390 offers, $50.824 max

ordinary-earwig-1 provisioning completed (running)
Traceback (most recent call last):
  File "/home/jvstme/git/dstack/dstack/venv/bin/dstack", line 8, in <module>
    sys.exit(main())
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/main.py", line 77, in main
    args.func(args)
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/commands/apply.py", line 70, in _command
    configurator.apply_configuration(
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/services/configurators/run.py", line 147, in apply_configuration
    if run.attach(bind_address=bind_address):
  File "/home/jvstme/git/dstack/dstack/src/dstack/api/_public/runs.py", line 255, in attach
    ports_lock = SSHAttach.reuse_ports_lock(run_name=self.name)
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/services/ssh/attach.py", line 50, in reuse_ports_lock
    output = subprocess.check_output(
  File "/usr/lib64/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib64/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['grep', '--', '-S /home/jvstme/.dstack/ssh/ordinary-earwig-1.control.sock']' returned non-zero exit status 1.

Attaching with dstack logs may also fail with the same error.

> dstack logs -a ordinary-earwig-1
Traceback (most recent call last):
  File "/home/jvstme/git/dstack/dstack/venv/bin/dstack", line 8, in <module>
    sys.exit(main())
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/main.py", line 77, in main
    args.func(args)
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/cli/commands/logs.py", line 52, in _command
    run.attach(args.ssh_identity_file)
  File "/home/jvstme/git/dstack/dstack/src/dstack/api/_public/runs.py", line 255, in attach
    ports_lock = SSHAttach.reuse_ports_lock(run_name=self.name)
  File "/home/jvstme/git/dstack/dstack/src/dstack/_internal/core/services/ssh/attach.py", line 50, in reuse_ports_lock
    output = subprocess.check_output(
  File "/usr/lib64/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib64/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['grep', '--', '-S /home/jvstme/.dstack/ssh/ordinary-earwig-1.control.sock']' returned non-zero exit status 1.

After a few attempts dstack logs may or may not succeed in attaching.

Expected behaviour

CLI attaches to the run on the first try.

dstack version

0.18.13

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions