PurpleTeam TLS Tester Implementation
Tuesday, September 7, 2021
The PurpleTeam TLS Tester is now implemented. I’ve written this post to highlight the learnings, and to talk about the various significant changes that were made as part of the release. All core components were released as version 1.0.0-alpha.3
.
The details of the above video can be found here.
Contents
All of the release notes can be accessed from the Github issue.
Massive set of releases just gone live around the new #SSL #TLS #Tester https://t.co/f0bPNRBjUh
— PurpleTeam (@purpleteamlabs) September 1, 2021
Documentation
- The Definitions were updated
- The Log and Outcomes files page was created, providing details of the Outcomes archive, what’s in it and how to read specific files. You can also see the contents of the Outcomes archive for this Job file
- The main architecture diagrams for
cloud
andlocal
have been updated - The Job file schema has been documented
- A FAQ page was created
- Setting up the Tls Tester, although this is trivial
- Debugging the Tls Tester
Work items created
As a result of the Tls Tester Implementation
- Re-work orchestrator.js
- Create Tester reset for “Tester failure:” occurrances
- Improve orchestrator Tester model error handling
- Re-work App and Tls Tester models
- Re-work Dockerfiles
- Extract common code into package
- Blog post on the TLS Scanner
Synchronisation
There ended up being quite a bit of work done around synchronisation of the components, and there is still work to be done. There were architectural decisions made several years ago that needed some modification, and as you can see from the Work items created there is ongoing work that needs to be done.
For example I discovered near the end of the implementation another edge-case around state of a given Tester being incorrect if a different Tester is in a Tester failure:
state.
You can read about the issue here. We will be addressing this one soon.
Then there is this lack of retry issue in the orchestrator Tester models which was found near the end of the TLS implementation work also, which probably won’t occur very often at all (we have never witnessed it), but it still needs to be fixed.
Before we get started discussing the synchronisation of components, you will need some understanding of the various relevant time-outs in the code base.
Time-outs
Many of the time-out issues with AWS just don’t exist when running local
ly. AWS Api Gateway does not support streaming, so we need to use long polling (lp
) between the CLI and the orchestrator in the cloud
environment.
CLI
For the test
command
The initial request to the orchestrator for the test
command has a set of timeouts, but it must stop trying before the back-end fails due to:
- Stage Two containers not being up and responsive within the currently
120000
(s2containers.serviceDiscoveryServiceInstances.timeoutToBeAvailable
) +30000
(s2containers.responsive.timeout
) duration - The Stage Two container service discovery services not being up and responsive within the same duration as above
If the CLI continues to retry after a back-end timeout, then it may continue to do so indefinitely if unsupervised, as is likely if being used in noUi
mode.
The time-out series for the test
command currently looks like the following for the cloud
environment. The CLI doesn’t timeout at all for local
:
Tries:
- 23000,
- 15000,
- 15000,
- 10010,
- 10010,
- 10010,
- 10010,
- 10010,
- 10010,
- 10010,
- 10010,
- 10010,
- 0 // Cancel
This adds up to 143090 + some request and response latency, a little short of 150000 + some comms latency in the AWS machine.
For tester
[ Progress
| PctComplete
| BugCount
] updates
Five long-poll request attempts with no data returned from the orchestrator and the CLI gives up.
// ...,
testerFeedbackComms: {
longPoll: {
nullProgressMaxRetries: {
doc: 'The number of times (sequentially receiving an event with a data object containing a property with a null value) to poll the backend when the orchestrator is not receiving feedback from the testers.',
format: 'int',
default: 5
}
}
},
// ...
Orchestrator
The following is used in the testerWatcher
and needs to be well under the AWS API Gateway timeout which is 30 seconds:
// ...,
testerFeedbackComms: {
// ...,
longPoll: {
timeout: {
doc: 'A double that expresses seconds to wait for blocking Redis commands. We need to timeout well before the AWS Api Gateway timeout.',
format: Number,
default: 20.0
}
}
}
App Tester
// ...,
s2Containers: {
serviceDiscoveryServiceInstances: {
timeoutToBeAvailable: {
doc: 'The duration in milliseconds before giving up on waiting for the s2 Service Discovery Service Instances to be available.',
format: 'duration',
default: 120000
},
retryIntervalToBeAvailable: {
doc: 'The retry interval in milliseconds for the s2 Service Discovery Service Instances to be available.',
format: 'duration',
default: 5000
}
},
responsive: {
timeout: {
doc: 'The duration in milliseconds before giving up on waiting for the s2 containers to be responsive.',
format: 'duration',
default: 30000
},
retryInterval: {
doc: 'The retry interval in milliseconds for the s2 containers to be responsive.',
format: 'duration',
default: 2000
}
}
},
// ...
The emissary.apiFeedbackSpeed
is used to send the CLI the following message types: testerProgress
, testerPctComplete
and testerBugCount
, thus keeping the lp
alive. This duration needs to be less than the orchestrator’s 20
second testerFeedbackComms.longPoll.timeout
.
emissary: {
// ...,
apiFeedbackSpeed: {
doc: 'The speed to poll the Zap API for feedback of test progress',
format: 'duration',
default: 5000
},
// ...
TLS Tester
If we don’t receive any update from the TLS Emissary within this duration (messageChannelHeartBeatInterval
) then the TLS Tester sends the CLI a testerProgress
message with the textData
: Tester is awaiting Emissary feedback...
. This duration needs to be less than the orchestrator’s 20
second testerFeedbackComms.longPoll.timeout
to make sure the CLI continues to poll the orchestrator for tester[Progress|PctComplete|BugCount]
updates.
// ...,
messageChannelHeartBeatInterval: {
doc: 'This is used to send heart beat messages every n milliseconds. Primarily to keep the orchestrator\'s testerWatcher longPoll timeout from being reached.',
format: 'duration',
default: 15000
},
// ...
Message flows
There are two flow types in play between the orchestrator and the CLI, namely Server Sent Events (sse
) and Long Polling (lp
).
Before reading this section dive over to the orchestrator README for a quick run-down on how PurpleTeam is using
sse
andlp
.
Before The TLS implementation, the testerFeedbackComms.medium
was defined in the configuration for both the orchestrator and the CLI. Both configurations had to match. If they didn’t the orchestrator would respond with an error message. Now this is defined in the orchestrator only and the orchestrator tells the CLI which medium it should use before stating either sse
or lp
.
When the CLI runs the test
command, there are three significant sequential events, I’ll brush over or omit less significant events to make explaining the flow easier to understand. If you’d rather just read the code it’s here:
- CLI makes a
POST
request to the orchestrator’s/test
route with the Job, and continues to do so according to it’s retry schedule.
The orchestrator’stestTeamAttack
routine is where a lot of the decision making occurs- If a Test Run is already in progress (
initTesterResponsesForCli
is defined) and the orchestrator already has the responses from the requests to the Testers/init-tester
route (initTesterResponsesForCli
has a length), whether the Testers were successfully initialised or not, then the Tester responses along with whether to usesse
orlp
to subscribe to Tester feedback are returned to the CLI - If a Test Run is already in progress (
initTesterResponsesForCli
is defined), the orchestrator causes a client-side time-out because a response from the request to the Testers/init-tester
route has not yet been received, and the orchestrator wants the CLI to try again once it times out - If execution gets past the above then a Test Run is not currently in progress, so the orchestrator:
- Sets a in-progress flag
- Asks it’s Tester models to initialise their Testers and wait for the responses
- Once all of the responses are received, the orchestrator populates a
failedTesterInitialisations
array with anyTester failure:
… messages - The orchestrator creates a
startTesters
boolean and assigns it true if every active Tester has it’s state set toTester initialised.
… (notAwaiting Job.
,Initialising Tester.
, or[App|Tls] tests are running.
), otherwise false is assigned - If there were any
failedTesterInitialisations
orstartTesters
is false:initTesterResponsesForCli
is populated with the responses from trying to initialise the Testers (both successful and/or unsuccessful)- A response is returned to the CLI with
initTesterResponsesForCli
and whether the orchestrator expects the CLI to usesse
orlp
- Otherwise:
- The orchestrator invokes each Testers
/start-tester
route - If we are running in
cloud
the orchestrator warms up the Test Session message (Redis) channels and lists, this waits for all Testers of the represented Test Sessions to provide their first message set. These message sets are assigned to an array calledwarmUpTestSessionMessageSets
which looks like the following before being populated with messages:[ { channelName: 'app-lowPrivUser', testerMessageSet: [] }, { channelName: 'app-adminUser', testerMessageSet: [] }, { channelName: 'tls-NA', testerMessageSet: [] } ]
If Testers are started and the orchestrator did not subscribe to the Test Session message channels, it would never know when the Test Sessions are finished in order to clean-up, so this subscription must occur
initTesterResponsesForCli
is populated with the responses from trying to initialise the Testers (only successful)- A response is returned to the CLI with
initTesterResponsesForCli
and whether the orchestrator expects the CLI to usesse
orlp
- The orchestrator invokes each Testers
- If a Test Run is already in progress (
- CLI makes a
GET
request to either of the following (currently this happens whether all Testers were initialised successfully or not, there is no point in this happening if there were anyTester failure:
messages returned from any Testers, we will change this soon):- If using
sse
?/tester-feedback/{testerName}/{sessionId}
:
In this case messages from the Test Sessions continue to flow through the Redis channels and the orchestrator continues to push them to the CLI - If using
lp
?/poll-tester-feedback/{testerName}/{sessionId}
:
In this case the CLI starts the long-poll process, the orchestrator checks to see ifwarmUpTestSessionMessageSets
contains an element for the given channel name (BTW: channel names are constructed like:${testerName}-${sessionId
) (this will only happen in thecloud
environment), if so it issplice
d out and returned, if not thepollTesterMessages
of thetesterWatcher
is invoked.pollTesterMessages
is responsible for providing a callback to each Redis channel which when invoked takes the given message from a Testers Test Session and pushes it on to the tail of a Redis list with the same name as the Redis channel that the message was received from. Each time the CLI requests a message set for a given Test Session, if no messages are yet available it waits (on Redisblpop
(blocking head pop)), if messages are available, they are popped (Redislpop
(non blocking head pop)) from the head of the Redis list
- If using
- CLI makes a
GET
request to the/outcomes
route- This happens once the CLI receives a message starting with
All Test Sessions of all Testers are finished
. By the time this has happens, the orchestrator has already cleaned up the Testers and created the Outcomes archive based on the results and reports generated by the Testers
- This happens once the CLI receives a message starting with
TLS Tester Implementation
Unlike the App Tester (app-scanner) which supervises an external Emissary (Zaproxy), the TLS Tester (tls-scanner) supervises an embedded Emissary (testssl.sh). This means that the TLS Emissary runs within the same container as the TLS Tester.
The Job file which the Build User provides to the CLI contains everything required to get the TLS Emissary running and targeting your website or web API.
The implementation of the TLS Tester was actually the easy part of this release. An additional stage one container image was required for local
and also in the Terraform configuration for cloud
in the form of AWS ECS Task Definition modification. The AWS ECR deployment script needed adding to.
The new TLS Tester isn’t that different from the App Tester other than it is a lot simpler because we don’t have to bring up stage two containers, and all the potential synchronisation issues around external Emissaries.
The execution flow goes from the /init-tester
and /start-tester
routes to the model.
/init-tester
basically sets the Tester up with the Build User supplied Job and sets the status
.
/start-tester
starts (spawn
s) the Cucumber CLI,
which initialises the Cucumber world which is where most of the domain specific parts are glued together, and the actual Cucumber Steps (tests) are run.
The following are added to the Cucumber world
:
- The
messagePublisher
(pushes messages onto Redis${testerName}-${sessionId
channels) sut
(System Under Test) domain objecttestssl
domain object
The testssl.sh process is spawn
ed.
When ever the TLS Emissary writes to stdout
the Tester deals with it here.