[Rose?]: Testing overall suite design - sensible approach for integration/end-to-end tests

Possibly more rose-related; can’t (manually) see anything obvious in previous qs - discourse cleverly picks up the tangentially-related “Pass command line env vars…” q (and a few false positives), but believe still enough new stuff here to warrant its own q.

I’m wanting to have an overall integration/end-to-end “test” mode for the suite I’m designing. This to ensure things like the overall graph/intercycle dependencies etc work as intended - I’ve got some quite thorny suicide/retry/restart aspects to manage.

I’m hoping to use a “test-driven development” approach as I build this, as reckon this might help avoid tears later. Using placeholder scripts & local compute to get faster feedback than if I iteratively develop using my real applications - spending ages waiting for runs to complete, needlessly hammering shared compute resources etc!

For a given app “task-a” I’m pretty happy I can set up some dummy scripts “task-a_test_bad-data.sh”, “task-a_test_datagap.sh” which mimic the expected failure modes of the real “task-a.sh”, but not involve actual computation, so fail as fast as I want, and cheaply. So I can check the design works as-intended.

And that per-app I can use optional configurations to call these different dummy scripts.

And that I can effectively use this to create ~unit/low-level integration tests for given bits by leveraging rose app-run --opt-conf-key

However, I’m interested in higher level integration/end-to-end test. I.e. probably only picking a few pathways through the myriad combinations of snafus with “dummy” scripts - middle tier of test pyramid; or single pathway through with the real script using canned data (top tier).

I’m not clear how I can achieve this cleanly (i.e. with as little Jinja/duplication as possible).
I think this might be using top-level rose-suite.conf suite configuration(s), which switch on the various tests.
And that this/these configurations could:

  • enable various ROSE_APP_OPT_CONF_KEYS in the apps - e.g. an end-to-end rose-suite.conf might set something (???) in the [[root]] family in suite.rc which would (where applicable) switch on the configs suitable for an end-to-end test in all apps
  • let me specify alternative hosts for tasks (e.g. just local host), so I can avoid hitting shared/expensive compute hosts

And that overall, I might (haven given this much thought) even be able to conveniently expose my desired test harness level (all unit/all integration/end-to-end) on the command line, allowing usage like rose suite-run - S 'TEST=end-to-end', using @oliver.sanderspoint in the Pass command line env vars q.

However, grateful of any pointers here as to what best approach would be (including “you’re thinking about this incorrectly”!).
Especially keen if this can recommend an approach which avoids too much Jinja / duplication!

I think its useful to think about the following when it comes to the test:

  • are you concerned with the workflow itself behaving as intended?
  • are you concerned with a particular part of the suite being broken?
  • are the components of your suite tested elsewhere?
  • are you using the suite to test all your components work together correctly?

and are you trying to answer all these things in one go, or just some of them?

In GloSea, for instance, we have a lot of separate testing of the components that go into the suite, but have a final cutdown version of it that makes sure all the bits work together when all the bits are stacked one after another in the workflow. At the same time we can run in, say, dummy or simulation mode to make sure our graphing is behaving as intended (our test mode also loads in some opt configs specific to testing). That tends to need some manual checking to get things to behave properly but is easier than setting up lots of different “if mode==test1” type items in the suite. We’ve also debated having additional tasks in there to check the output of those tasks are as intended - not implemented at present but it’s been on the todo list for a while. Said tasks would be on a jinja switch.

2 Likes

Thanks @arjclark - good points. Yup, my current scope mostly around:

  • the workflow behaving as intended (via running in dummy/simulation mode)

The core part of the suite is a well-tested, externally-developed package. And one of my tasks runs the test battery for that package. So I’m not wanting to test that component

However, I’ll be writing some of my own components to:

  • manage passing input data to the core task
  • create semaphore files used by the core task to cause it to die cleanly when the input data is such that it will cause the core task to fail (a ~common occurrence, which I want to recover from automatically)
  • route out, and archive output data cleanly given the core task is interruptible
    And I’ll generally be trying to ensure the workflow graphs/suite.rc have sensible retries/suicides

As I’m ~new to rose/cylc, I’m expecting to make lots of mistakes in the workflow / logic along the way, hence thinking it will be sensible to protect myself with dummy tasks to give “coverage” of failure modes which I might not experience during suite design if I just used the real core app / live data. And to speed development up!

A secondary concern is to test the “written-by-me” components (as they’re otherwise untested!), but I think I’m ~happy to hive that off separately.

Thanks - will have a look at GloSea’s implementation!

1 Like

You’ll want to look at GloSea’s rose-stem as well as suite. Hopefully you can see the progression from unit tests to rose-stem to suite in test (prePS) mode. Feel free to ping me an email if you need something explaining in that pile.

1 Like

Separating graphs for test and development from the main graph is really useful to speed up work on parts of your suite. I give each graph it’s own file (and think of the graph as the “main programme” when testing). A TEST graph “A” is the just included in the suite.rc file as

{# DEFINE GRAPH OF SUITE #}
{% if TEST %}
{% include [‘suite-graph-test_’,TEST,’.rc’]|join %}
{% else %}
{% include ‘suite-graph-main.rc’ %}
{% endif %}

and defined in two additional files suite-graph-test_A.rc and opt/rose-suite-A.conf.

1 Like

I agree with @arjclark’s excellent list of bullet points on what to consider in this context.

For this one, you should be able to rely on Cylc’s own test battery to ensure that all triggering works as advertised in a complex workflow. However, I realize that’s not quite the same as seeing and understanding it yourself, without waiting ages for real task jobs to run, when developing a complex workflow.

BTW @oliver.sanders has some ideas on how to build some proper “test harness” functionality into Cylc itself in the future, although I think that would be more for workflows that test many component tasks (a la rose stem) and automatically collate the results, than for testing the workflow itself.

2 Likes

Workflows by their nature are full of moving parts making them a right pain to test.

Tooling

There are some things we can do in Cylc to make testing workflows sub-assemblages within workflows, etc, easier. We have some ideas on how to put this into practice, however, for the time being the tools at your disposal are:

  • cylc validate
    • Has a --strict option (note rose suite-run calls cylc validate --strict before calling cylc run).
    • Grepping for deprecated syntax messages may be a good idea (will make this easier in the future).
  • Rose optional configurations
    • These in effect allow you to define multiple run modes for apps or entire suites.
  • Jinja2 (can provide similar functionality to optional configurations).
  • cylc get-config
    • Which can be used to extract the value of settings from the config (e.g. cylc get-config <reg> -i '[runtime][root][environment]).’

Any tools to add to this list?

End-To-End / Data-Pipeline Testing

Using a test mode as suggested above can be a good way forward, you can use rose optional configurations to change the behaviour of tasks in the suite. You could configure simple tasks to just pass or make minor modifications to the filesystem in order to avoid major Jinja2 changes.

Graph Testing

For simulation / dummy mode tests I’m guessing what you really want to test is that the “graph” has been configured correctly. Unfortunately this is rather difficult to do since from Cylc’s perspective it’s impossible to tell between a correct and incorrect graph since Cylc doesn’t understand the intention of the graph. The main kinds of testing I can think of would be:

  • Check that a certain task appears in a certain cycle (or recurrence).
  • Check that there is a pathway between one task and another (e.g. a task which runs at the beginning of a cycle and one which runs at the end).
  • Check that graph “heads” have clock triggers (for real-time workflows).

Unfortunately Cylc does not expose an API for graph traversal so answering these kinds of questions is quite difficult. We plan to open up a Python API for suite configuration with Cylc9 which should address problems like this, however, Cylc9 is a long way in the future.

1 Like