Cylc task event handler is not working correctly for batch system = pbs

We are trying to capture events from Cylc suite while running it with batch system as pbs. Even though we mentioned multiple events to capture (handler events = submitted, submission failed, submission timeout, submission retry, started, succeeded, warning, critical, failed, retry), we were able to capture only, submitted event.

Here is our suite.rc file.

[meta]
  # Suite metadata.
  title = TEST SUITE TITLE
  url = www.testsuiteurl.com
  uuid = 148ec4de-4a89-11e9-8646-d663bd873d84
  priority = HIGH
  color = blue
[cylc]
  UTC mode = True
[scheduling]
  initial cycle point = "now"
  [[special tasks]]
    clock-trigger = bar(+PT1M)
  [[dependencies]]
    [[[PT1M]]]
      graph = bar
[runtime]
  [[bar]]
    script = "qsub -- /bin/sleep 1"
    [[[job]]]
      batch system = pbs
    [[[meta]]]
      # Task metadata.
      title = TEST TASK TITLE
      priority = LOW
      url = www.testtaskurl.com
      omit = false
      color = red
    [[[events]]]
      handler events = submitted, submission failed, submission timeout, submission retry, started, succeeded, warning, critical, failed, retry
      handlers = handler.py event=%(event)s suite=%(suite)s  point=%(point)s  task_name=%(name)s  submit_num=%(submit_num)s  id=%(id)s  suite_url=%(suite_url)s  batch_sys_name=%(batch_sys_name)s  batch_sys_job_id=%(batch_sys_job_id)s start_time=%(start_time)s end_time=%(finish_time)s uuid=%(suite_uuid)s user_at_host=%(user@host)s msg=%(message)s

Below are events that were captured by handler.py.

{"event": "submitted", "suite": "Acc_upd", "point": "20190709T1207Z", "task_name": "bar", "submit_num": "1", "id": "bar.20190709T1207Z", "suite_url": "www.testsuiteurl.com", "batch_sys_name": "pbs", "batch_sys_job_id": "19.pbsserver", "start_time": "None", "end_time": "None", "uuid": "148ec4de-4a89-11e9-8646-d663bd873d84", "user_at_host": "pbsuser@localhost", "msg": "job submitted"}
{"event": "submitted", "suite": "Acc_upd", "point": "20190709T1209Z", "task_name": "bar", "submit_num": "1", "id": "bar.20190709T1209Z", "suite_url": "www.testsuiteurl.com", "batch_sys_name": "pbs", "batch_sys_job_id": "20.pbsserver", "start_time": "None", "end_time": "None", "uuid": "148ec4de-4a89-11e9-8646-d663bd873d84", "user_at_host": "pbsuser@localhost", "msg": "job submitted"}
{"event": "submitted", "suite": "Acc_upd", "point": "20190709T1210Z", "task_name": "bar", "submit_num": "1", "id": "bar.20190709T1210Z", "suite_url": "www.testsuiteurl.com", "batch_sys_name": "pbs", "batch_sys_job_id": "21.pbsserver", "start_time": "None", "end_time": "None", "uuid": "148ec4de-4a89-11e9-8646-d663bd873d84", "user_at_host": "pbsuser@localhost", "msg": "job submitted"}

When we tried with the batch system as background, we were able to capture all events.
Any ideas on where is it going wrong and what can I do to debug it further?

Amaresh C.

Hello,

Is there any information in the suite log, there may be some form of traceback comming from the handler.py script.

Here is the information in suite log.

REGISTERED msas_pbs -> /home/pbsuser/cylc-master/pbs-job-suite
Valid for cylc-7.8.1
2019-07-10T16:42:35Z DEBUG - Loading site/user global config files
2019-07-10T16:42:35Z DEBUG - Generated /home/pbsuser/cylc-run/msas_pbs/.service/passphrase
2019-07-10T16:42:36Z DEBUG - Generated /home/pbsuser/cylc-run/msas_pbs/.service/ssl.pem
2019-07-10T16:42:36Z DEBUG - Generated /home/pbsuser/cylc-run/msas_pbs/.service/ssl.cert
2019-07-10T16:42:36Z DEBUG - Loading site/user global config files
            ._.                                                       
            | |                 The Cylc Suite Engine [7.8.1]         
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY;
      .___! |          see `cylc warranty`.  It is free software, you 
      !_____!           are welcome to redistribute it under certain  
2019-07-10T16:42:36Z DEBUG - creating suite run directory: /home/pbsuser/cylc-run/msas_pbs
2019-07-10T16:42:36Z DEBUG - creating suite log directory: /home/pbsuser/cylc-run/msas_pbs/log/suite
2019-07-10T16:42:36Z DEBUG - creating suite job log directory: /home/pbsuser/cylc-run/msas_pbs/log/job
2019-07-10T16:42:36Z DEBUG - creating suite config log directory: /home/pbsuser/cylc-run/msas_pbs/log/suiterc
2019-07-10T16:42:36Z DEBUG - creating suite work directory: /home/pbsuser/cylc-run/msas_pbs/work
2019-07-10T16:42:36Z DEBUG - creating suite share directory: /home/pbsuser/cylc-run/msas_pbs/share
2019-07-10T16:42:36Z DEBUG - Reading file /home/pbsuser/bom-cylc-master/pbs-job-suite/suite.rc
2019-07-10T16:42:36Z DEBUG - Processed configuration dumped: /home/pbsuser/cylc-run/msas_pbs/suite.rc.processed
2019-07-10T16:42:36Z DEBUG - Expanding [runtime] namespace lists and parameters
2019-07-10T16:42:36Z DEBUG - Parsing the runtime namespace hierarchy
2019-07-10T16:42:36Z DEBUG - Parsing [special tasks]
2019-07-10T16:42:36Z DEBUG - Parsing the dependency graph
2019-07-10T16:42:36Z DEBUG - Configuring internal queues
2019-07-10T16:42:36Z INFO - Suite server: url=https://pbsserver:43087/ pid=2335
2019-07-10T16:42:36Z INFO - Run: (re)start=0 log=1
2019-07-10T16:42:36Z INFO - Cylc version: 7.8.1
2019-07-10T16:42:36Z INFO - Run mode: live
2019-07-10T16:42:36Z INFO - Initial point: 20190710T1642Z
2019-07-10T16:42:36Z INFO - Final point: None
2019-07-10T16:42:36Z INFO - Cold Start 20190710T1642Z
2019-07-10T16:42:36Z DEBUG - [bar.20190710T1642Z] -released to the task pool
2019-07-10T16:42:36Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:42:36Z DEBUG - 0 task(s) de-queued
2019-07-10T16:42:36Z DEBUG - END TASK PROCESSING (took 0.00154805183411 seconds)
2019-07-10T16:42:36Z DEBUG - Performing suite health check
2019-07-10T16:42:36Z DEBUG - Loading site/user global config files
2019-07-10T16:42:37Z DEBUG - Performing suite health check
2019-07-10T16:42:38Z DEBUG - Performing suite health check
2019-07-10T16:42:39Z DEBUG - Performing suite health check
2019-07-10T16:42:40Z DEBUG - Performing suite health check
2019-07-10T16:42:41Z DEBUG - Performing suite health check
2019-07-10T16:42:42Z DEBUG - Performing suite health check
2019-07-10T16:42:43Z DEBUG - Performing suite health check
2019-07-10T16:42:44Z DEBUG - Performing suite health check
2019-07-10T16:42:45Z DEBUG - Performing suite health check
2019-07-10T16:42:46Z DEBUG - Performing suite health check
2019-07-10T16:42:47Z DEBUG - Performing suite health check
2019-07-10T16:42:48Z DEBUG - Performing suite health check
2019-07-10T16:42:49Z DEBUG - Performing suite health check
2019-07-10T16:42:50Z DEBUG - Performing suite health check
2019-07-10T16:42:51Z DEBUG - Performing suite health check
2019-07-10T16:42:52Z DEBUG - Performing suite health check
2019-07-10T16:42:53Z DEBUG - Performing suite health check
2019-07-10T16:42:54Z DEBUG - Performing suite health check
2019-07-10T16:42:55Z DEBUG - Performing suite health check
2019-07-10T16:42:56Z DEBUG - Performing suite health check
2019-07-10T16:42:57Z DEBUG - Performing suite health check
2019-07-10T16:42:58Z DEBUG - Performing suite health check
2019-07-10T16:42:59Z DEBUG - Performing suite health check
2019-07-10T16:43:00Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:43:00Z DEBUG - [bar.20190710T1642Z] -waiting => queued
2019-07-10T16:43:00Z DEBUG - 1 task(s) de-queued
2019-07-10T16:43:00Z INFO - [bar.20190710T1642Z] -submit-num=1, owner@host=localhost
2019-07-10T16:43:00Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job'] ... # will invoke in batches, sizes=[1]
2019-07-10T16:43:00Z DEBUG - [bar.20190710T1642Z] -queued => ready
2019-07-10T16:43:00Z DEBUG - END TASK PROCESSING (took 0.0437498092651 seconds)
2019-07-10T16:43:00Z DEBUG - Performing suite health check
2019-07-10T16:43:01Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job', '20190710T1642Z/bar/01']
2019-07-10T16:43:01Z DEBUG - Performing suite health check
2019-07-10T16:43:01Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode -- /home/pbsuser/cylc-run/msas_pbs/log/job 20190710T1642Z/bar/01
	[jobs-submit ret_code] 0
	[jobs-submit out]
	[TASK JOB SUMMARY]2019-07-10T16:43:01Z|20190710T1642Z/bar/01|0|5.pbsserver
	[TASK JOB COMMAND]2019-07-10T16:43:01Z|20190710T1642Z/bar/01|[STDOUT] 5.pbsserver
2019-07-10T16:43:01Z INFO - [bar.20190710T1642Z] -(current:ready) submitted at 2019-07-10T16:43:01Z
2019-07-10T16:43:01Z DEBUG - [bar.20190710T1642Z] -ready => submitted
2019-07-10T16:43:01Z DEBUG - [bar.20190710T1642Z] -Queueing submitted handler: handler.py event=submitted suite=msas_pbs  point=20190710T1642Z  task_name=bar  submit_num=1  id=bar.20190710T1642Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=5.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:43:01Z INFO - [bar.20190710T1642Z] -health check settings: submission timeout=None
2019-07-10T16:43:01Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:43:01Z DEBUG - 0 task(s) de-queued
2019-07-10T16:43:01Z DEBUG - [bar.20190710T1642Z] -forced spawning
2019-07-10T16:43:01Z DEBUG - END TASK PROCESSING (took 0.00335788726807 seconds)
2019-07-10T16:43:01Z DEBUG - Performing suite health check
2019-07-10T16:43:02Z DEBUG - [bar.20190710T1643Z] -released to the task pool
2019-07-10T16:43:02Z DEBUG - handler.py event=submitted suite=msas_pbs  point=20190710T1642Z  task_name=bar  submit_num=1  id=bar.20190710T1642Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=5.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:43:02Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:43:02Z DEBUG - 0 task(s) de-queued
2019-07-10T16:43:02Z DEBUG - END TASK PROCESSING (took 0.00166988372803 seconds)
2019-07-10T16:43:02Z DEBUG - Performing suite health check
2019-07-10T16:43:02Z DEBUG - [(('event-handler-00', 'submitted'), 1) cmd] handler.py event=submitted suite=msas_pbs  point=20190710T1642Z  task_name=bar  submit_num=1  id=bar.20190710T1642Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=5.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
	[(('event-handler-00', 'submitted'), 1) ret_code] 0
2019-07-10T16:43:02Z DEBUG - Performing suite health check
2019-07-10T16:43:03Z DEBUG - Performing suite health check
2019-07-10T16:43:04Z DEBUG - Performing suite health check
2019-07-10T16:43:05Z DEBUG - Performing suite health check
2019-07-10T16:43:06Z DEBUG - Performing suite health check
2019-07-10T16:43:07Z DEBUG - Performing suite health check
2019-07-10T16:43:08Z DEBUG - Performing suite health check
2019-07-10T16:43:09Z DEBUG - Performing suite health check
2019-07-10T16:43:10Z DEBUG - Performing suite health check
2019-07-10T16:43:11Z DEBUG - Performing suite health check
2019-07-10T16:43:12Z DEBUG - Performing suite health check
2019-07-10T16:43:13Z DEBUG - Performing suite health check
2019-07-10T16:43:14Z DEBUG - Performing suite health check
2019-07-10T16:43:15Z DEBUG - Performing suite health check
2019-07-10T16:43:16Z DEBUG - Performing suite health check
2019-07-10T16:43:17Z DEBUG - Performing suite health check
2019-07-10T16:43:18Z DEBUG - Performing suite health check
2019-07-10T16:43:19Z DEBUG - Performing suite health check
2019-07-10T16:43:20Z DEBUG - Performing suite health check
2019-07-10T16:43:21Z DEBUG - Performing suite health check
2019-07-10T16:43:22Z DEBUG - Performing suite health check
2019-07-10T16:43:23Z DEBUG - Performing suite health check
2019-07-10T16:43:24Z DEBUG - Performing suite health check
2019-07-10T16:43:25Z DEBUG - Performing suite health check
2019-07-10T16:43:26Z DEBUG - Performing suite health check
2019-07-10T16:43:27Z DEBUG - Performing suite health check
2019-07-10T16:43:28Z DEBUG - Performing suite health check
2019-07-10T16:43:29Z DEBUG - Performing suite health check
2019-07-10T16:43:30Z DEBUG - Performing suite health check
2019-07-10T16:43:31Z DEBUG - Performing suite health check
2019-07-10T16:43:32Z DEBUG - Performing suite health check
2019-07-10T16:43:33Z DEBUG - Performing suite health check
2019-07-10T16:43:34Z DEBUG - Performing suite health check
2019-07-10T16:43:35Z DEBUG - Performing suite health check
2019-07-10T16:43:36Z DEBUG - Performing suite health check
2019-07-10T16:43:37Z DEBUG - Performing suite health check
2019-07-10T16:43:38Z DEBUG - Performing suite health check
2019-07-10T16:43:39Z DEBUG - Performing suite health check
2019-07-10T16:43:40Z DEBUG - Performing suite health check
2019-07-10T16:43:41Z DEBUG - Performing suite health check
2019-07-10T16:43:42Z DEBUG - Performing suite health check
2019-07-10T16:43:43Z DEBUG - Performing suite health check
2019-07-10T16:43:44Z DEBUG - Performing suite health check
2019-07-10T16:43:45Z DEBUG - Performing suite health check
2019-07-10T16:43:46Z DEBUG - Performing suite health check
2019-07-10T16:43:47Z DEBUG - Performing suite health check
2019-07-10T16:43:48Z DEBUG - Performing suite health check
2019-07-10T16:43:49Z DEBUG - Performing suite health check
2019-07-10T16:43:50Z DEBUG - Performing suite health check
2019-07-10T16:43:51Z DEBUG - Performing suite health check
2019-07-10T16:43:52Z DEBUG - Performing suite health check
2019-07-10T16:43:53Z DEBUG - Performing suite health check
2019-07-10T16:43:54Z DEBUG - Performing suite health check
2019-07-10T16:43:55Z DEBUG - Performing suite health check
2019-07-10T16:43:56Z DEBUG - Performing suite health check
2019-07-10T16:43:57Z DEBUG - Performing suite health check
2019-07-10T16:43:58Z DEBUG - Performing suite health check
2019-07-10T16:43:59Z DEBUG - Performing suite health check
2019-07-10T16:44:00Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:44:00Z DEBUG - [bar.20190710T1643Z] -waiting => queued
2019-07-10T16:44:00Z DEBUG - 1 task(s) de-queued
2019-07-10T16:44:00Z INFO - [bar.20190710T1643Z] -submit-num=1, owner@host=localhost
2019-07-10T16:44:00Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job'] ... # will invoke in batches, sizes=[1]
2019-07-10T16:44:00Z DEBUG - [bar.20190710T1643Z] -queued => ready
2019-07-10T16:44:00Z DEBUG - END TASK PROCESSING (took 0.0428490638733 seconds)
2019-07-10T16:44:00Z DEBUG - Performing suite health check
2019-07-10T16:44:01Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job', '20190710T1643Z/bar/01']
2019-07-10T16:44:01Z DEBUG - Performing suite health check
2019-07-10T16:44:01Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode -- /home/pbsuser/cylc-run/msas_pbs/log/job 20190710T1643Z/bar/01
	[jobs-submit ret_code] 0
	[jobs-submit out]
	[TASK JOB SUMMARY]2019-07-10T16:44:01Z|20190710T1643Z/bar/01|0|6.pbsserver
	[TASK JOB COMMAND]2019-07-10T16:44:01Z|20190710T1643Z/bar/01|[STDOUT] 6.pbsserver
2019-07-10T16:44:01Z INFO - [bar.20190710T1643Z] -(current:ready) submitted at 2019-07-10T16:44:01Z
2019-07-10T16:44:01Z DEBUG - [bar.20190710T1643Z] -ready => submitted
2019-07-10T16:44:01Z DEBUG - [bar.20190710T1643Z] -Queueing submitted handler: handler.py event=submitted suite=msas_pbs  point=20190710T1643Z  task_name=bar  submit_num=1  id=bar.20190710T1643Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=6.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:44:01Z INFO - [bar.20190710T1643Z] -health check settings: submission timeout=None
2019-07-10T16:44:01Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:44:01Z DEBUG - 0 task(s) de-queued
2019-07-10T16:44:01Z DEBUG - [bar.20190710T1643Z] -forced spawning
2019-07-10T16:44:01Z DEBUG - END TASK PROCESSING (took 0.00356602668762 seconds)
2019-07-10T16:44:01Z DEBUG - Performing suite health check
2019-07-10T16:44:02Z DEBUG - [bar.20190710T1644Z] -released to the task pool
2019-07-10T16:44:02Z DEBUG - handler.py event=submitted suite=msas_pbs  point=20190710T1643Z  task_name=bar  submit_num=1  id=bar.20190710T1643Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=6.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:44:02Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:44:02Z DEBUG - 0 task(s) de-queued
2019-07-10T16:44:02Z DEBUG - END TASK PROCESSING (took 0.00160813331604 seconds)
2019-07-10T16:44:02Z DEBUG - Performing suite health check
2019-07-10T16:44:02Z DEBUG - [(('event-handler-00', 'submitted'), 1) cmd] handler.py event=submitted suite=msas_pbs  point=20190710T1643Z  task_name=bar  submit_num=1  id=bar.20190710T1643Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=6.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
	[(('event-handler-00', 'submitted'), 1) ret_code] 0
2019-07-10T16:44:02Z DEBUG - Performing suite health check
2019-07-10T16:44:03Z DEBUG - Performing suite health check
2019-07-10T16:44:04Z DEBUG - Performing suite health check
2019-07-10T16:44:05Z DEBUG - Performing suite health check
2019-07-10T16:44:06Z DEBUG - Performing suite health check
2019-07-10T16:44:07Z DEBUG - Performing suite health check
2019-07-10T16:44:08Z DEBUG - Performing suite health check
2019-07-10T16:44:09Z DEBUG - Performing suite health check
2019-07-10T16:44:10Z DEBUG - Performing suite health check
2019-07-10T16:44:11Z DEBUG - Performing suite health check
2019-07-10T16:44:12Z DEBUG - Performing suite health check
2019-07-10T16:44:13Z DEBUG - Performing suite health check
2019-07-10T16:44:14Z DEBUG - Performing suite health check
2019-07-10T16:44:15Z DEBUG - Performing suite health check
2019-07-10T16:44:16Z DEBUG - Performing suite health check
2019-07-10T16:44:17Z DEBUG - Performing suite health check
2019-07-10T16:44:18Z DEBUG - Performing suite health check
2019-07-10T16:44:19Z DEBUG - Performing suite health check
2019-07-10T16:44:20Z DEBUG - Performing suite health check
2019-07-10T16:44:21Z DEBUG - Performing suite health check
2019-07-10T16:44:22Z DEBUG - Performing suite health check
2019-07-10T16:44:23Z DEBUG - Performing suite health check
2019-07-10T16:44:24Z DEBUG - Performing suite health check
2019-07-10T16:44:25Z DEBUG - Performing suite health check
2019-07-10T16:44:26Z DEBUG - Performing suite health check
2019-07-10T16:44:27Z DEBUG - Performing suite health check
2019-07-10T16:44:28Z DEBUG - Performing suite health check
2019-07-10T16:44:29Z DEBUG - Performing suite health check
2019-07-10T16:44:30Z DEBUG - Performing suite health check
2019-07-10T16:44:31Z DEBUG - Performing suite health check
2019-07-10T16:44:32Z DEBUG - Performing suite health check
2019-07-10T16:44:33Z DEBUG - Performing suite health check
2019-07-10T16:44:34Z DEBUG - Performing suite health check
2019-07-10T16:44:35Z DEBUG - Performing suite health check
2019-07-10T16:44:36Z DEBUG - Performing suite health check
2019-07-10T16:44:37Z DEBUG - Performing suite health check
2019-07-10T16:44:38Z DEBUG - Performing suite health check
2019-07-10T16:44:39Z DEBUG - Performing suite health check
2019-07-10T16:44:40Z DEBUG - Performing suite health check
2019-07-10T16:44:41Z DEBUG - Performing suite health check
2019-07-10T16:44:42Z DEBUG - Performing suite health check
2019-07-10T16:44:43Z DEBUG - Performing suite health check
2019-07-10T16:44:44Z DEBUG - Performing suite health check
2019-07-10T16:44:45Z DEBUG - Performing suite health check
2019-07-10T16:44:46Z DEBUG - Performing suite health check
2019-07-10T16:44:47Z DEBUG - Performing suite health check
2019-07-10T16:44:48Z DEBUG - Performing suite health check
2019-07-10T16:44:49Z DEBUG - Performing suite health check
2019-07-10T16:44:50Z DEBUG - Performing suite health check
2019-07-10T16:44:51Z DEBUG - Performing suite health check
2019-07-10T16:44:52Z DEBUG - Performing suite health check
2019-07-10T16:44:53Z DEBUG - Performing suite health check
2019-07-10T16:44:54Z DEBUG - Performing suite health check
2019-07-10T16:44:55Z DEBUG - Performing suite health check
2019-07-10T16:44:56Z DEBUG - Performing suite health check
2019-07-10T16:44:57Z DEBUG - Performing suite health check
2019-07-10T16:44:58Z DEBUG - Performing suite health check
2019-07-10T16:44:59Z DEBUG - Performing suite health check
2019-07-10T16:45:00Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:45:00Z DEBUG - [bar.20190710T1644Z] -waiting => queued
2019-07-10T16:45:00Z DEBUG - 1 task(s) de-queued
2019-07-10T16:45:00Z INFO - [bar.20190710T1644Z] -submit-num=1, owner@host=localhost
2019-07-10T16:45:00Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job'] ... # will invoke in batches, sizes=[1]
2019-07-10T16:45:00Z DEBUG - [bar.20190710T1644Z] -queued => ready
2019-07-10T16:45:00Z DEBUG - END TASK PROCESSING (took 0.0340039730072 seconds)
2019-07-10T16:45:00Z DEBUG - Performing suite health check
2019-07-10T16:45:01Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/pbsuser/cylc-run/msas_pbs/log/job', '20190710T1644Z/bar/01']
2019-07-10T16:45:01Z DEBUG - Performing suite health check
2019-07-10T16:45:01Z DEBUG - Performing suite health check
2019-07-10T16:45:02Z DEBUG - Performing suite health check
2019-07-10T16:45:02Z DEBUG - Performing suite health check
2019-07-10T16:45:03Z DEBUG - Performing suite health check
2019-07-10T16:45:03Z DEBUG - Performing suite health check
2019-07-10T16:45:04Z DEBUG - Performing suite health check
2019-07-10T16:45:04Z DEBUG - Performing suite health check
2019-07-10T16:45:05Z DEBUG - Performing suite health check
2019-07-10T16:45:05Z DEBUG - Performing suite health check
2019-07-10T16:45:06Z DEBUG - Performing suite health check
2019-07-10T16:45:06Z DEBUG - Performing suite health check
2019-07-10T16:45:07Z DEBUG - Performing suite health check
2019-07-10T16:45:07Z DEBUG - Performing suite health check
2019-07-10T16:45:08Z DEBUG - Performing suite health check
2019-07-10T16:45:08Z DEBUG - Performing suite health check
2019-07-10T16:45:09Z DEBUG - Performing suite health check
2019-07-10T16:45:09Z DEBUG - Performing suite health check
2019-07-10T16:45:10Z DEBUG - Performing suite health check
2019-07-10T16:45:10Z DEBUG - Performing suite health check
2019-07-10T16:45:11Z DEBUG - Performing suite health check
2019-07-10T16:45:11Z DEBUG - Performing suite health check
2019-07-10T16:45:12Z DEBUG - Performing suite health check
2019-07-10T16:45:12Z DEBUG - Performing suite health check
2019-07-10T16:45:13Z DEBUG - Performing suite health check
2019-07-10T16:45:13Z DEBUG - Performing suite health check
2019-07-10T16:45:14Z DEBUG - Performing suite health check
2019-07-10T16:45:14Z DEBUG - Performing suite health check
2019-07-10T16:45:15Z DEBUG - Performing suite health check
2019-07-10T16:45:15Z DEBUG - Performing suite health check
2019-07-10T16:45:16Z DEBUG - Performing suite health check
2019-07-10T16:45:16Z DEBUG - Performing suite health check
2019-07-10T16:45:17Z DEBUG - Performing suite health check
2019-07-10T16:45:17Z DEBUG - Performing suite health check
2019-07-10T16:45:18Z DEBUG - Performing suite health check
2019-07-10T16:45:18Z DEBUG - Performing suite health check
2019-07-10T16:45:19Z DEBUG - Performing suite health check
2019-07-10T16:45:19Z DEBUG - Performing suite health check
2019-07-10T16:45:20Z DEBUG - Performing suite health check
2019-07-10T16:45:20Z DEBUG - Performing suite health check
2019-07-10T16:45:21Z DEBUG - Performing suite health check
2019-07-10T16:45:21Z DEBUG - Performing suite health check
2019-07-10T16:45:22Z DEBUG - Performing suite health check
2019-07-10T16:45:22Z DEBUG - Performing suite health check
2019-07-10T16:45:23Z DEBUG - Performing suite health check
2019-07-10T16:45:23Z DEBUG - Performing suite health check
2019-07-10T16:45:24Z DEBUG - Performing suite health check
2019-07-10T16:45:24Z DEBUG - Performing suite health check
2019-07-10T16:45:25Z DEBUG - Performing suite health check
2019-07-10T16:45:25Z DEBUG - Performing suite health check
2019-07-10T16:45:26Z DEBUG - Performing suite health check
2019-07-10T16:45:26Z DEBUG - Performing suite health check
2019-07-10T16:45:27Z DEBUG - Performing suite health check
2019-07-10T16:45:27Z DEBUG - Performing suite health check
2019-07-10T16:45:28Z DEBUG - Performing suite health check
2019-07-10T16:45:28Z DEBUG - Performing suite health check
2019-07-10T16:45:29Z DEBUG - Performing suite health check
2019-07-10T16:45:29Z DEBUG - Performing suite health check
2019-07-10T16:45:30Z DEBUG - Performing suite health check
2019-07-10T16:45:30Z DEBUG - Performing suite health check
2019-07-10T16:45:31Z DEBUG - Performing suite health check
2019-07-10T16:45:31Z DEBUG - Performing suite health check
2019-07-10T16:45:32Z DEBUG - Performing suite health check
2019-07-10T16:45:32Z DEBUG - Performing suite health check
2019-07-10T16:45:33Z DEBUG - Performing suite health check
2019-07-10T16:45:33Z DEBUG - Performing suite health check
2019-07-10T16:45:34Z DEBUG - Performing suite health check
2019-07-10T16:45:34Z DEBUG - Performing suite health check
2019-07-10T16:45:35Z DEBUG - Performing suite health check
2019-07-10T16:45:35Z DEBUG - Performing suite health check
2019-07-10T16:45:36Z DEBUG - Performing suite health check
2019-07-10T16:45:36Z DEBUG - Performing suite health check
2019-07-10T16:45:37Z DEBUG - Performing suite health check
2019-07-10T16:45:37Z DEBUG - Performing suite health check
2019-07-10T16:45:38Z DEBUG - Performing suite health check
2019-07-10T16:45:38Z DEBUG - Performing suite health check
2019-07-10T16:45:39Z DEBUG - Performing suite health check
2019-07-10T16:45:39Z DEBUG - Performing suite health check
2019-07-10T16:45:40Z DEBUG - Performing suite health check
2019-07-10T16:45:40Z DEBUG - Performing suite health check
2019-07-10T16:45:41Z DEBUG - Performing suite health check
2019-07-10T16:45:41Z DEBUG - Performing suite health check
2019-07-10T16:45:42Z DEBUG - Performing suite health check
2019-07-10T16:45:42Z DEBUG - Performing suite health check
2019-07-10T16:45:43Z DEBUG - Performing suite health check
2019-07-10T16:45:43Z DEBUG - Performing suite health check
2019-07-10T16:45:44Z DEBUG - Performing suite health check
2019-07-10T16:45:44Z DEBUG - Performing suite health check
2019-07-10T16:45:45Z DEBUG - Performing suite health check
2019-07-10T16:45:45Z DEBUG - Performing suite health check
2019-07-10T16:45:46Z DEBUG - Performing suite health check
2019-07-10T16:45:46Z DEBUG - Performing suite health check
2019-07-10T16:45:47Z DEBUG - Performing suite health check
2019-07-10T16:45:47Z DEBUG - Performing suite health check
2019-07-10T16:45:48Z DEBUG - Performing suite health check
2019-07-10T16:45:48Z DEBUG - Performing suite health check
2019-07-10T16:45:49Z DEBUG - Performing suite health check
2019-07-10T16:45:49Z DEBUG - Performing suite health check
2019-07-10T16:45:50Z DEBUG - Performing suite health check
2019-07-10T16:45:50Z DEBUG - Performing suite health check
2019-07-10T16:45:51Z DEBUG - Performing suite health check
2019-07-10T16:45:51Z DEBUG - Performing suite health check
2019-07-10T16:45:52Z DEBUG - Performing suite health check
2019-07-10T16:45:52Z DEBUG - Performing suite health check
2019-07-10T16:45:53Z DEBUG - Performing suite health check
2019-07-10T16:45:53Z DEBUG - Performing suite health check
2019-07-10T16:45:54Z DEBUG - Performing suite health check
2019-07-10T16:45:54Z DEBUG - Performing suite health check
2019-07-10T16:45:55Z DEBUG - Performing suite health check
2019-07-10T16:45:55Z DEBUG - Performing suite health check
2019-07-10T16:45:56Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode -- /home/pbsuser/cylc-run/msas_pbs/log/job 20190710T1644Z/bar/01
	[jobs-submit ret_code] 0
	[jobs-submit out]
	[TASK JOB SUMMARY]2019-07-10T16:45:01Z|20190710T1644Z/bar/01|0|7.pbsserver
	[TASK JOB COMMAND]2019-07-10T16:45:01Z|20190710T1644Z/bar/01|[STDOUT] 7.pbsserver
2019-07-10T16:45:56Z INFO - [bar.20190710T1644Z] -(current:ready) submitted at 2019-07-10T16:45:01Z
2019-07-10T16:45:56Z DEBUG - [bar.20190710T1644Z] -ready => submitted
2019-07-10T16:45:56Z DEBUG - [bar.20190710T1644Z] -Queueing submitted handler: handler.py event=submitted suite=msas_pbs  point=20190710T1644Z  task_name=bar  submit_num=1  id=bar.20190710T1644Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=7.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:45:56Z INFO - [bar.20190710T1644Z] -health check settings: submission timeout=None
2019-07-10T16:45:56Z DEBUG - BEGIN TASK PROCESSING
2019-07-10T16:45:56Z DEBUG - 0 task(s) de-queued
2019-07-10T16:45:56Z DEBUG - [bar.20190710T1644Z] -forced spawning
2019-07-10T16:45:56Z DEBUG - END TASK PROCESSING (took 0.00327706336975 seconds)
2019-07-10T16:45:56Z DEBUG - Performing suite health check
2019-07-10T16:45:56Z DEBUG - handler.py event=submitted suite=msas_pbs  point=20190710T1644Z  task_name=bar  submit_num=1  id=bar.20190710T1644Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=7.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
2019-07-10T16:45:56Z DEBUG - Performing suite health check
2019-07-10T16:45:57Z DEBUG - [(('event-handler-00', 'submitted'), 1) cmd] handler.py event=submitted suite=msas_pbs  point=20190710T1644Z  task_name=bar  submit_num=1  id=bar.20190710T1644Z  suite_url=www.testsuiteurl.com  batch_sys_name=pbs  batch_sys_job_id=7.pbsserver start_time=None end_time=None uuid=148ec4de-4a89-11e9-8646-d663bd873d84 user_at_host=pbsuser@localhost msg='job submitted'
	[(('event-handler-00', 'submitted'), 1) ret_code] 0
2019-07-10T16:45:57Z DEBUG - Performing suite health check
2019-07-10T16:45:58Z DEBUG - Performing suite health check
2019-07-10T16:45:59Z DEBUG - Performing suite health check
2019-07-10T16:46:00Z DEBUG - Performing suite health check
2019-07-10T16:46:01Z DEBUG - Performing suite health check
2019-07-10T16:46:02Z DEBUG - Performing suite health check
2019-07-10T16:46:03Z DEBUG - Performing suite health check
2019-07-10T16:46:04Z DEBUG - Performing suite health check
2019-07-10T16:46:05Z DEBUG - Performing suite health check
2019-07-10T16:46:06Z DEBUG - Performing suite health check
2019-07-10T16:46:07Z DEBUG - Performing suite health check

Also, handler.py file contents.

#!/usr/bin/env python3

"""
A generic Cylc event handler.

"""

import sys
import json
import os
from inspect import cleandoc


def main():
    """
    A generic Cylc event handler: it takes one or more kwargs and pretty-prints
    them.

    USAGE:
       handler.py key1=val1 key2=val2 ...

    Can be partnered with the generic cylc_kafka_consumer external trigger
    function, for triggering downstream suites.

    """

    if len(sys.argv) < 2:
        print("ERROR: one or more args required.\n")
        print(cleandoc(main.__doc__))
        sys.exit(1)

    if 'help' in sys.argv[1]:
        print(cleandoc(main.__doc__))
        sys.exit(0)

    # Construct a dict from kwargs.
    dmsg = dict([k.split('=') for k in sys.argv[1:]])

    # Write to a file
    # datalist = []
    if not os.path.isfile('data.json'):
        with open('data.json', 'w') as f:
                json.dump(dmsg, f)
    else:
        with open('data.json', mode='a') as f:
                f.write('\n')
                f.write(json.dumps(dmsg))




if __name__ == "__main__":
    main()

Taking a quick look at the suite.rc file

script

You are including qsub in the task’s script:

    script = "qsub -- /bin/sleep 1"

Cylc handles job submission for you, the script should just be what you want to execute.

    script = "/bin/sleep 1"
host

Also as no host has been specified Cylc will attempt to submit the job locally, this may be what you intended if your pbs setup supports this, however, if your setup requires you to SSH to the host to submit the job you will need to tell Cylc which host to do this on:

[[[remote]]]
host = name-of-host

See https://cylc.github.io/doc/built-sphinx/appendices/suiterc-config-ref.html#runtime-name-remote-host for details.

Taking a look at the suite log

According to Cylc the task bar never actually ran and is just sitting there in the submitted state. Cylc has only fired the submitted event handler as that is the only event to have occurred.

This could be because:

  • The job is sitting in the PBS queue so hasn’t actually run yet.
    • Check with the pbs system to see if this is the case.
  • The job has run but was unable to communicate with Cylc.

@amaresh494 - did you get your suite (and event handlers) working?

Yes, @hilary.j.oliver. It is working.

Job has run but was unable to communicate with Cylc.
Thank you @oliver.sanders.

1 Like