"a skilled backdoor-writer can defeat skilled auditors"?
Hi there, in a different thread, Cam posted a link containing this gem: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In short several very skilled security auditors examined a small Python program — about 100 lines of code — into which three bugs had been inserted by the authors. There was an “easy,” “medium,” and “hard” backdoor. There were three or four teams of auditors. 1. One auditor found the “easy” and the “medium” ones in about 70 minutes, and then spent the rest of the day failing to find any other bugs. 2. One team of two auditors found the “easy” bug in about five hours, and spent the rest of the day failing to find any other bugs. 3. One auditor found the “easy” bug in about four hours, and then stopped. 4. One auditor either found no bugs or else was on a team with the third auditor — the report is unclear. See Chapter 7 of Yee’s report for these details. I should emphasize that that I personally consider these people to be extremely skilled. One possible conclusion that could be drawn from this experience is that a skilled backdoor-writer can defeat skilled auditors. This hypothesis holds that only accidental bugs can be reliably detected by auditors, not deliberately hidden bugs. Anyway, as far as I understand the bugs you folks left in were accidental bugs that you then deliberately didn’t-fix, rather than bugs that you intentionally made hard-to-spot. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - https://blog.spideroak.com/20140220090004-responsibly-bringing-new-cryptogra... I have no problem believing it is thus, but can't help wondering if there are any ways to mitigate it. -- Pozdr rysiek
On Wed, Jun 04, 2014 at 12:35:20AM +0200, rysiek wrote:
In short several very skilled security auditors examined a small Python program — about 100 lines of code — into which three bugs had been inserted by the authors. There was an “easy,” “medium,” and “hard” backdoor. There were three or four teams of auditors.
1. One auditor found the “easy” and the “medium” ones in about 70 minutes, and then spent the rest of the day failing to find any other bugs.
2. One team of two auditors found the “easy” bug in about five hours, and spent the rest of the day failing to find any other bugs.
3. One auditor found the “easy” bug in about four hours, and then stopped.
4. One auditor either found no bugs or else was on a team with the third auditor — the report is unclear.
See Chapter 7 of Yee’s report for these details.
I should emphasize that that I personally consider these people to be extremely skilled. One possible conclusion that could be drawn from this experience is that a skilled backdoor-writer can defeat skilled auditors. This hypothesis holds that only accidental bugs can be reliably detected by auditors, not deliberately hidden bugs.
Anyway, as far as I understand the bugs you folks left in were accidental bugs that you then deliberately didn’t-fix, rather than bugs that you intentionally made hard-to-spot.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - https://blog.spideroak.com/20140220090004-responsibly-bringing-new-cryptogra...
I have no problem believing it is thus, but can't help wondering if there are any ways to mitigate it.
My mitigation would be to make auditing a default-deny rule, rather than a default-allow. Security auditing needs to be a holistic analysis, starting by re-engaging with the requirements, verifying that the design is a sensible and minimal approach to addressing the requirements, and verifying that the implementation is a sensible, safe, auditable, version controlled, approach to the design. If the auditor at any point says "Well, I wouldn't have *recommended* that you implement your JSON parsing in ad-hoc C with pointer arithmetic and poor and misleading comments, but I can't find any *bugs* so I guess it must be OK" then that is an immediate fail. This is the default deny: we default to assuming the system is insecure, and any sign that this might be true results in a failure. Versus the current auditing method of default-allow: we run the audit, and if no *concrete* exploits or bugs are found before the auditors run out of time, then we trumpet that the system "has passed its audit". Only if the design is sane, the implementation is sane, the development team is following best practices and defensive coding strategies, with a cryptographically and procedurally audited edit trail (immutable git commit logs signed and committed to W/O media) in a development environment that is safe by default rather than risky by default ... ... then you *might* have a chance of catching the intentional backdoor inserted by the APT malware on your team member's workstation. Current efforts in this direction fall *very* far short of the utopia I describe. -andy
Message du 04/06/14 00:58 De : "Andy Isaacson"
On Wed, Jun 04, 2014 at 12:35:20AM +0200, rysiek wrote:
In short several very skilled security auditors examined a small Python program — about 100 lines of code — into which three bugs had been inserted by the authors. There was an “easy,” “medium,” and “hard” backdoor. There were three or four teams of auditors.
1. One auditor found the “easy” and the “medium” ones in about 70 minutes, and then spent the rest of the day failing to find any other bugs.
2. One team of two auditors found the “easy” bug in about five hours, and spent the rest of the day failing to find any other bugs.
3. One auditor found the “easy” bug in about four hours, and then stopped.
4. One auditor either found no bugs or else was on a team with the third auditor — the report is unclear.
See Chapter 7 of Yee’s report for these details.
I should emphasize that that I personally consider these people to be extremely skilled. One possible conclusion that could be drawn from this experience is that a skilled backdoor-writer can defeat skilled auditors. This hypothesis holds that only accidental bugs can be reliably detected by auditors, not deliberately hidden bugs.
Anyway, as far as I understand the bugs you folks left in were accidental bugs that you then deliberately didn’t-fix, rather than bugs that you intentionally made hard-to-spot.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - https://blog.spideroak.com/20140220090004-responsibly-bringing-new-cryptogra...
I have no problem believing it is thus, but can't help wondering if there are any ways to mitigate it.
My mitigation would be to make auditing a default-deny rule, rather than a default-allow.
Security auditing needs to be a holistic analysis, starting by re-engaging with the requirements, verifying that the design is a sensible and minimal approach to addressing the requirements, and verifying that the implementation is a sensible, safe, auditable, version controlled, approach to the design.
If the auditor at any point says "Well, I wouldn't have *recommended* that you implement your JSON parsing in ad-hoc C with pointer arithmetic and poor and misleading comments, but I can't find any *bugs* so I guess it must be OK" then that is an immediate fail.
This is the default deny: we default to assuming the system is insecure, and any sign that this might be true results in a failure.
Versus the current auditing method of default-allow: we run the audit, and if no *concrete* exploits or bugs are found before the auditors run out of time, then we trumpet that the system "has passed its audit".
Only if the design is sane, the implementation is sane, the development team is following best practices and defensive coding strategies, with a cryptographically and procedurally audited edit trail (immutable git commit logs signed and committed to W/O media) in a development environment that is safe by default rather than risky by default ...
... then you *might* have a chance of catching the intentional backdoor inserted by the APT malware on your team member's workstation.
Current efforts in this direction fall *very* far short of the utopia I describe.
-andy
Your proposal would cause 99% of software currently in use to be rejected and make the development costs increase as astronomically as to be compared to medical research. That would also smother the hopes of all people that see coding and computers as a way out of poverty. Like, outsourcing stuff to Asia would grind to a halt. I agree your proposal is good and doable, yet at a cost the world doesn't wish to pay. It wouldn't reduce innovation, probably would increase it, though. Also it would filter out all incompetents and posers, forcing them to adapt or look at burger flipping in McDonald's with other eyes ...
On Wed, Jun 04, 2014 at 03:06:43AM +0200, tpb-crypto@laposte.net wrote:
Your proposal would cause 99% of software currently in use to be rejected
That seems like a feature... (note that I don't think most software should be audited as security critical. We can reduce the Trusted Computing Base and audit only those bits.)
and make the development costs increase as astronomically as to be compared to medical research.
I like to compare our current situation to the Steam Age. There was an enormous amount of innovation in steam power, heating, etc in the 1800s. There was a concomitant lack of standardized safety measures, and occasionally boilers exploded taking entire apartment buildings with them. Over time the rate of innovation decreased, standardization set in, safety measures were instituted, and now we have boring steam radiators in apartment buildings rather than exciting steam-powered Difference Engines in our pockets. -andy
Dnia wtorek, 3 czerwca 2014 18:32:52 piszesz:
On Wed, Jun 04, 2014 at 03:06:43AM +0200, tpb-crypto@laposte.net wrote:
Your proposal would cause 99% of software currently in use to be rejected
That seems like a feature...
(note that I don't think most software should be audited as security critical. We can reduce the Trusted Computing Base and audit only those bits.)
and make the development costs increase as astronomically as to be compared to medical research.
I like to compare our current situation to the Steam Age. There was an enormous amount of innovation in steam power, heating, etc in the 1800s. There was a concomitant lack of standardized safety measures, and occasionally boilers exploded taking entire apartment buildings with them.
Over time the rate of innovation decreased, standardization set in, safety measures were instituted, and now we have boring steam radiators in apartment buildings rather than exciting steam-powered Difference Engines in our pockets.
I love that analoy. I was usually using "one of the reasons bridges are safe today is because we have safety standards and not everybody can build one", but yours is much better. -- Pozdr rysiek
On Tue, Jun 3, 2014 at 6:06 PM, <tpb-crypto@laposte.net> wrote:
... Your proposal [building meaningful security in from the start] would cause 99% of software currently in use to be rejected and make the development costs increase as astronomically as to be compared to medical research.
1% making the cut is a far too generous estimate, perhaps 1% of 1%. as for the cost issue, which must be paid somewhere, you make two assumptions: first, assuming the externalities of insecure systems are simply non-exist-ant. the costs of our pervasive vulnerability are gargantuan, yet the complexity and cost of robust alternatives instills paralysis. (this lack of significant progress in development of secure systems feeds your defeatist observations; it's ok ;) second, that the schedules and styles of development as we currently practice it will always be. if you solved a core (commodity) infosec problem once, very well, in a way that could be widely adopted, you would only need to implement it once! (then spending five years and ten fold cost building to last becomes reasonable) for now, it appears stasis and external costs are the status quo. the future, if here at all, is clearly not yet widely distributed... best regards,
Message du 04/06/14 05:40 De : "coderman"
On Tue, Jun 3, 2014 at 6:06 PM, wrote:
... Your proposal [building meaningful security in from the start] would cause 99% of software currently in use to be rejected and make the development costs increase as astronomically as to be compared to medical research.
1% making the cut is a far too generous estimate, perhaps 1% of 1%. as for the cost issue, which must be paid somewhere,
you make two assumptions:
first, assuming the externalities of insecure systems are simply non-exist-ant. the costs of our pervasive vulnerability are gargantuan, yet the complexity and cost of robust alternatives instills paralysis. (this lack of significant progress in development of secure systems feeds your defeatist observations; it's ok ;)
I kind of feel like an ant looking at the task of moving a mountain.
second, that the schedules and styles of development as we currently practice it will always be. if you solved a core (commodity) infosec problem once, very well, in a way that could be widely adopted, you would only need to implement it once! (then spending five years and ten fold cost building to last becomes reasonable)
Yah no, we never know when a problem is really solved. We may consider it solved, then someone comes and breaks it for us. Not even formal proofs stand forever.
On 2014-06-04, 00:53, Andy Isaacson wrote:
If the auditor at any point says "Well, I wouldn't have *recommended* that you implement your JSON parsing in ad-hoc C with pointer arithmetic and poor and misleading comments, but I can't find any *bugs* so I guess it must be OK" then that is an immediate fail.
And that I think is going too far. There might be perfectly valid reasons to do what the developer did, and saying post-hoc that you fail the audit because you don't like some design choices opens the door to personal biases. (Good luck, for example, trying to write nontrivial C without at least some form of pointer arithmetic.) If you fail the audit, it's your duty as a professional auditor to provide evidence that there is something actually wrong with the software. It's OK to single out some pieces of code for closer inspection because of code smells, but if you try your darnedest to find something wrong with it and can't, then either the code is OK or you're not good enough an auditor. In either case, you can flag the code, you can recommend rewriting it according to what you think is better style, but you can't in good conscience fail the audit. Fun, Stephan
On Tue, Jun 3, 2014 at 10:54 PM, Stephan Neuhaus <stephan.neuhaus@tik.ee.ethz.ch> wrote:
... And that I think is going too far. There might be perfectly valid reasons to do what the developer did, and saying post-hoc that you fail the audit because you don't like some design choices opens the door to personal biases. (Good luck, for example, trying to write nontrivial C without at least some form of pointer arithmetic.)
there is a significant difference between engineering for safety, conservatively. and sloppy error prone techniques indicating haste and carelessness. pointer arithmetic in C may be unavoidable, yet using them consistently with thoughtfulness and robustness is always a great idea. defensive designs and conservative implementations are not "personal biases" in any form! [ what is defensive and conservative? well, i know it when i see it! *grin* ]
If you fail the audit, it's your duty as a professional auditor to provide evidence that there is something actually wrong with the software.
"why do I need to add braces around my if clause? that's your opinion about style, who cares??" 'you don't have to, but a trivial edit error could bleed you for years if you don't!' if (what) { goto fail; goto fail; /* fail safer! */ }
On 2014-06-04, 09:46, coderman wrote:
there is a significant difference between engineering for safety, conservatively. and sloppy error prone techniques indicating haste and carelessness.
pointer arithmetic in C may be unavoidable, yet using them consistently with thoughtfulness and robustness is always a great idea.
Absolutely. My gripe was with the "automatic fail" of the OP. It's perfectly fine to say "this code doesn't look as if it was engineered for safety and you should consider rewriting it", and you can say "I can't audit this code, it's too complex for me", but you can't, IMHO, say "I fail this code's audit because it has a number of code smells" unless absence of code smells was a design requirement or there is evidence that these code smells are associated with security problems. Fun, Stephan --
On 4 June 2014 01:54, Stephan Neuhaus <stephan.neuhaus@tik.ee.ethz.ch> wrote:
If you fail the audit, it's your duty as a professional auditor to provide evidence that there is something actually wrong with the software. It's OK to single out some pieces of code for closer inspection because of code smells, but if you try your darnedest to find something wrong with it and can't, then either the code is OK or you're not good enough an auditor. In either case, you can flag the code, you can recommend rewriting it according to what you think is better style, but you can't in good conscience fail the audit.
Perhaps this is getting too far into nits and wording, but I audit software for my day job (iSEC Partners). I'm not speaking for my employer. But, with very few exceptions (we have a compliance arm for example), one does not 'Pass' or 'Fail' one of our audits. (Perhaps they might be better termed as 'security assessments' then, like we call them internally, but we're speaking in common english, and people tend to use them synonymously.) Our customers are (mostly) on board with that too. They never ask us if they 'passed' or failed' - I'm certain some of them look at a report where we failed to 'steal the crown jewels' as a successful audit - but the expectation we set with them, and they sign on with, is not one of 'Pass/Fail'. And engagements where they want a statement saying they're secure, we turn down - we're not in the business of rubber stamps*. Our goal is to review software, identify bugs, and provide recommendations to fix that issue and prevent it from occurring again. AND, in addition to the specific bugs, provide general recommendations for the team to make their application and environment more secure - provide defense in depth. Maybe I didn't find a bug that let me do X, but if there's a layer of defense you can put in that would stop someone who did, and you're missing that layer, I would recommend it. Examples: I audited an application that had no Mass Assignment bugs - but no defenses against it either. Blacklists preventing XSS instead of whitelist approaches, and like Andy said, homebrew C-code parsing JSON. We 'flag'-ed all of that, and told them they should rewrite, rearchitect, or add layered defenses - even if we couldn't find bugs or bypasses. So the notion of 'Passing' or 'Failing' an audit is pretty foreign to me. Perhaps people mean a different type of work (compliance?) than the one I do. -tom * The closest we get is one where we say 'We tested X as of [Date] for Y amount of time for the following classes of vulnerabilities, reported them, retested them Z months later, and confirmed they were fixed.' As we do this very rarely, very selectively, for clients we've dealt with before.
On Wed, Jun 04, 2014 at 08:50:14AM -0400, Tom Ritter wrote:
On 4 June 2014 01:54, Stephan Neuhaus <stephan.neuhaus@tik.ee.ethz.ch> wrote:
If you fail the audit, it's your duty as a professional auditor to provide evidence that there is something actually wrong with the software. It's OK to single out some pieces of code for closer inspection because of code smells, but if you try your darnedest to find something wrong with it and can't, then either the code is OK or you're not good enough an auditor. In either case, you can flag the code, you can recommend rewriting it according to what you think is better style, but you can't in good conscience fail the audit.
Stephan, I strongly disagree. There are implementations that are Just Too Complicated and are Impossible To Audit. Such implementation choices *do*, empirically, provide cover for bugs; and as we as a society build more and more software into the fabric of our life-critical systems it's imperative that "the implementor liked this complexity and refuses to change it" gives way to the larger goals at stake. The auditor absolutely must have leeway to say "no you don't get to write your own string processing, you are going to use the standard ones." This kind of feedback is precisely what happens in the higher quality audits that are becoming standard practice for security-critical software.
Perhaps this is getting too far into nits and wording, but I audit software for my day job (iSEC Partners). I'm not speaking for my employer. But, with very few exceptions (we have a compliance arm for example), one does not 'Pass' or 'Fail' one of our audits. (Perhaps they might be better termed as 'security assessments' then, like we call them internally, but we're speaking in common english, and people tend to use them synonymously.)
As a satisifed iSec customer (at a previous job), I have a bit of insight here. iSec is a leader in this space and definitely leads by example. Across the industry, the average quality of discourse in the source auditing business is pretty good in my experience; only the bottom-skimming truly awful auditors reduce their customer-facing feedback to just a binary pass/fail. However, inevitably, in the societal analysis of software quality for practical purposes, reductive reasoning happens. (This is not a bad thing, it's absolutely necessary -- we humans don't have the cognitive capacity to hold a complete decision tree in our head while doing this reasoning.) Thus statements like "you should use $OSS_CRYPTO_PACKAGE, it has passed its audits" end up playing a role in the discourse. We as domain experts have an obligation to ensure that our contribution is given appropriate weight in the debate and decisions -- in both directions. For example if an auditor sees their results being mis-interpreted in customer marketing material or media coverage, the auditor has a moral obligation to correct that and insist that the mischaracterization stop. (And yes, I believe that this moral obligation would override an NDA between the customer and the auditor; the contract should be structured to recognize this fact.) -andy
On 2014-06-04, 20:22, Andy Isaacson wrote:
On Wed, Jun 04, 2014 at 08:50:14AM -0400, Tom Ritter wrote:
On 4 June 2014 01:54, Stephan Neuhaus <stephan.neuhaus@tik.ee.ethz.ch> wrote:
If you fail the audit, it's your duty as a professional auditor to provide evidence that there is something actually wrong with the software. It's OK to single out some pieces of code for closer inspection because of code smells, but if you try your darnedest to find something wrong with it and can't, then either the code is OK or you're not good enough an auditor. In either case, you can flag the code, you can recommend rewriting it according to what you think is better style, but you can't in good conscience fail the audit.
Stephan,
I strongly disagree. There are implementations that are Just Too Complicated and are Impossible To Audit. Such implementation choices *do*, empirically, provide cover for bugs; and as we as a society build more and more software into the fabric of our life-critical systems it's imperative that "the implementor liked this complexity and refuses to change it" gives way to the larger goals at stake. The auditor absolutely must have leeway to say "no you don't get to write your own string processing, you are going to use the standard ones."
I think that we are mostly in agreement, except perhaps in wording. We both agree that auditors rarely "pass/fail" software in a binary fashion. And as I wrote, the auditor absolutely has the leeway to recommend rewriting. But my gripe was with the "automatic fail" in the original post, to which I said that this was "going too far". If you do go that far (i.e., don't just recommend changes, but "fail" the audit), your verdict must be founded on evidence. For example, if it were actually true that complexity, "empirically, provides cover for bugs", that would be a perfectly good argument in favour of failing an audit. It's just that I've worked for a few years in precisely this field and all the studies I saw simply failed to show the necessary correlations. (The best study I know, by Yonghee Shin and Laurie Williams, shows rho <= 0.3, and that on the vulnerability-infested Mozilla JavaScript engine. See http://collaboration.csc.ncsu.edu/laurie/Papers/p47-shin.pdf) This shows, I think, that auditors must be extra careful not to confuse folklore with evidence. You can say "this code is too complex for me to audit", and you can add "this should give you food for thought and you should consider rewriting it in a simpler style", but *as the auditor* you cannot say "I fail the code because I can't audit it" unless auditability was a design requirement. (For the *owners* of the code, their options are of course much greater, but we were talking about this from the auditor's perspective, and the OP talked about an "automatic fail" if the code turned out to have certain smells. If a smell isn't backed up by evidence, it's just a personal prejudice or folklore. Which, incidentally, would be excellent new terms to replace "Best Practice" in many cases.) Again, please note that I agree with you that auditability and simplicity, using braces even for single-line if-statements, library functions rather than self-made string libraries, and all these other things, ought to be design requirements, especially for security-critical software, because they make auditing easier. It's just that if it wasn't, then you can fault the design requirements (though that may be outside your remit as auditor), but you can't "automatically fail" the implementation. Fun, Stephan
On 2014-06-04 08:35, rysiek wrote:
Hi there,
in a different thread, Cam posted a link containing this gem:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
In short several very skilled security auditors examined a small Python program — about 100 lines of code — into which three bugs had been inserted by the authors. There was an “easy,” “medium,” and “hard” backdoor. There were three or four teams of auditors.
1. One auditor found the “easy” and the “medium” ones in about 70 minutes, and then spent the rest of the day failing to find any other bugs.
2. One team of two auditors found the “easy” bug in about five hours, and spent the rest of the day failing to find any other bugs.
3. One auditor found the “easy” bug in about four hours, and then stopped.
4. One auditor either found no bugs or else was on a team with the third auditor — the report is unclear.
See Chapter 7 of Yee’s report for these details.
I should emphasize that that I personally consider these people to be extremely skilled. One possible conclusion that could be drawn from this experience is that a skilled backdoor-writer can defeat skilled auditors. This hypothesis holds that only accidental bugs can be reliably detected by auditors, not deliberately hidden bugs.
Anyway, as far as I understand the bugs you folks left in were accidental bugs that you then deliberately didn’t-fix, rather than bugs that you intentionally made hard-to-spot.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - https://blog.spideroak.com/20140220090004-responsibly-bringing-new-cryptogra...
I have no problem believing it is thus, but can't help wondering if there are any ways to mitigate it.
The underhanded C contest produced stuff that was pretty easy to detect. Maybe Python supports more subtle bugs, or maybe the auditors sucked.
participants (7)
-
Andy Isaacson
-
coderman
-
James A. Donald
-
rysiek
-
Stephan Neuhaus
-
Tom Ritter
-
tpb-crypto@laposte.net