This file is obtained from RDFA testing document using tidyhtml.
Text-based specification and ncval
How do we apply text-based specification to sequences of bytes?
https://code.google.com/p/nativeclient/issues/detail?id=3453
Basically it’s a handful of python functions which accept disassembly listing and say whether it’s correct from the sandboxing point of view.
This executable specification is intended to be more or less readable and automatically up-to-date.
Note that there is no goal to reflect all validator quirks in the spec. For example, for purely technical reason old validator (and consequently new one) rejects 16-bit atomics. We plan to allow them eventually. Specification allows them from the beginning.
This specification is not a test per se, but it is used in two sets of tests: targeted tests and exhaustive validator tests.
https://code.google.com/p/nativeclient/issues/detail?id=3037
https://code.google.com/p/nativeclient/issues/detail?id=3452
These tests serve several purposes:
There are about 200 manually written test files (incorporating ~300 test cases) for 32- and 64-bit validators. They originate from tests Karl used for old validator. Since then they were supplemented with tests for most (hopefully, all) defects and subtle behaviours we encountered during the work on RDFA validator. They also include tests Mark used for his prototype DFA-based validator. They are somewhat poorly structured, so sometimes it may not be directly obvious where to find test case for a specific problem, but there is a lot of stuff there.
Test format. Test files (validator_ragel/testdata/32/*.test, validator_ragel/testdata/64/*.test) consist of one or more test cases separated by ‘------------’. Each test case consists of sections. Example:
@hex:
# This is the correct nop case.
# nopw 0x0(%eax,%eax,1)
66 0f 1f 44 00 00
@dis:
0: 66 0f 1f 44 00 00 nopw 0x0(%eax,%eax,1)
@rdfa_output:
return code: 0
@spec:
SAFE
----------------------------------------------------------------------
@hex:
# This is an example where we have incremented the nop by 1.
66 0f 1f 44 00 01
@dis:
0: 66 0f 1f 44 00 01 nopw 0x1(%eax,%eax,1)
@rdfa_output:
0: [0] unrecognized instruction
return code: 1
@spec:
0: unrecognized instruction 'nopw 0x1(%eax,%eax,1)'
‘@hex’ section contains input data as sequence of bytes. By convention, in complex cases bytes corresponding to instruction are usually preceded with comment specifying assembly form of the instruction. But this correspondence is not enforced so in principle comments can lie. To make it easier to spot errors in comments, there is a ‘@dis’ section containing output of nacl-objdump on given input.
‘@spec’ section contains expected output of text-based ncval (which receives its input from @dis section).
‘@rdfa_output’ section contains expected output of RDFA-based ncval (actually, it’s processed in a certain way, see https://code.google.com/p/nativeclient/issues/detail?id=3037 under ‘tricky part’ paragraph. In the ideal world this simulation of error recovery won’t be needed, as any test would contain no more than one violation of sandboxing rules, but for historical reasons we have tests with several errors in a single chunk of code and we want to check them all. New tests should exhibit only one failure per test case)
For each of these three sections there is a dedicated scons target:
All these test targets are run on bots.
If ‘regenerate_golden=1’ option is passed to scons with any of these targets, the content of the section is replaced with the actual output of the corresponding tool. It is helpful when tests are edited. Of course, each such change have to be manually reviewed.
TODO: when @rdfa_output and @spec disagree and how to check for it.
Note: there is legacy stuff in validator_x86/testdata.
How to run all these tests:
./scons small_tests
(or, more specifically,
./scons run_dis_section_test_32 run_rdfa_targeted_tests_32 run_spec_val_test_32
and same for 64)
https://code.google.com/p/nativeclient/issues/detail?id=3154
The primary purpose of this test is to find errors in our instruction definition files.
We enumerate all instruction sequences accepted by decoder automaton and compare output of our decoder with specific version of objdump.
We do not enumerate all possible values of immediates (there are too many of them). Transitions corresponding to immediate bytes (as well as direct jump/call targets and relative offsets - collectively ‘anybytes’) are marked in the automaton, so we recognize them in our traversal and only generate one representative instance of the immediate.
Our decoder behaves differently than objdump when it comes to fwait instruction (https://code.google.com/p/nativeclient/issues/detail?id=3251), for example ‘FWAIT; FNINIT’ sequence is decoded as single ‘FINIT’ instruction by objdump. We decide not to reproduce this behavior and instead took precautions to ensure that FWAIT instruction is always followed with NOP in the stream we generate.
Also, RDFA decoder does not sign-extend negative immediates (https://code.google.com/p/nativeclient/issues/detail?id=3164), but it does not show up in this test because we use positive numbers for ‘anybytes’.
How to run:
./scons dfacheckdecoder
Since this test requires ragel, it can only be run on linux. Also, it takes a while (about an hour on z620), so we do not even attempt to run in on bots.
There are ~250 millions 32-bit instructions and ~4 billions 64-bit instructions accepted by decoder.
https://code.google.com/p/nativeclient/issues/detail?id=3167
This test is designed to catch following problems:
(requires ragel, platform=x86-64 (because of python/validator integration) and old ncval built)
Basically we are solving some kind of ‘inverse kleeny star problem’: given DFA, we attempt to find such set of words, that any word is accepted by this DFA iff it is a concatenation of words from this set. We do not know how to solve this problem in general efficiently, so we are using some algorithm which makes certain assumptions about DFA structure (and verifies this assumption along the way). Corresponding code lives in validator_ragel/verify_validators_dfa.py.
For technical reasons we subdivide all such words into two categories: regular instructions and superinstructions.
Superinstructions enumerated by validator_ragel/verify_validators_dfa.py are checked in validator_ragel/verify_superinstructions.py. For each byte sequence we call objdump to make sure that it does not end mid-instruction. Then we parse disassembly listing to determine whether it is indeed valid superinstruction (since part of validation logic resides in DFA action, some of the byte sequences accepted by automaton are invalid from sandboxing point of view). And then we invoke validator itself through python interface and check that it accepts or rejects given byte sequence according to sandboxing rules. There are less than dozen types of superinstructions and they are relatively easy to parse (it is done in function ValidateSuperinstruction32/64 in validator_ragel/spec.py), so we don’t bother to compare against the old validator for simplicity.
Regular instructions are enumerated and checked by validator_ragel/verify_regular_instructions.py. There are about 4M 32-bit instructions and 70M 64-bit instructions accepted by DFA, so it is quite a costly test (about an hour on z620). For each instruction, we call objdump to ensure that it’s indeed a single instruction. If text-based specification rejects this instruction, we make sure RDFA validator rejects it as well. There is no point for checks in other direction, because we ultimately enumerate only byte sequences RDFA accepts (enumerating all sequences would be impossible).
Actually, this scheme works for 32-bit validator. 64-bit one is additionally complicated by the fact that information flows between instructions (in the form of current restricted register). So we have to ensure that specification and RDFA validator agree on instruction pre- and post- conditions.
Also, just as in exhaustive decoder test, we actually do not try all possible values for ‘anybytes’ (and direct jump targets fall into this category). Anyway, checking jump targets logic is not the goal of this test (we rely on manually written targeted tests for jumps instead).
There is similar test in validator_ragel/verify_regular_instruction_old.py. Instead of comparing against text-based specification, it compares against old validator (and additionally objdump is used to check that instruction length is determined correctly). Hopefully we will be able to get rid of it soon.
How to run:
./scons dfacheckvalidator platform=x86-64
This test requires ragel and takes a lot of time, so it can only be run on linux. It uses python interface to validator (implemented as DSO), so supplied value of ‘platform’ parameter should match python bitness. Additionally, since it uses both 32-bit and 64-bit ncvals, following commands should be run before manually:
./scons ncval platform=x86-32
./scons ncval platform=x86-64
(this requirement can’t be represented as scons dependencies because these targets span across different platform configurations)
Of course we could use objdump to get disassembly, but that raises the question how reliable is objdump in presence of invalid instructions (which is not its intended use case)?
Objdump is well-tested for the instructions which make sense and which CPU accepts (each time someone adds the instruction to gas it's added to objdump with the appropriate tests and everything) but it's not all that good for incorrect instructions (especially ones which are similar to other, existing, instructions). E.g.:
0: 66 0f 78 c0 02 01 extrq $0x1,$0x2,%xmm0
6: c5 f8 28 d1 vmovaps %xmm1,%xmm2
a: c5 f2 2a d0 vcvtsi2ss %eax,%xmm1,%xmm2
e: 66 0f 78 00 extrq $0x0,$0x78,(bad)
12: c5 fa 28 d1 vmovaps %xmm1,%xmm2
16: c5 f0 2a (bad)
19: d0 .byte 0xd0
First three instructions are correct and instructions after that point are minor modifications of the existing instructions (extrq with register, not memory, vmovaps with vex.pp changed from 00 to 10, and vcvtsi2ss with vex.pp changed from 10 to 00). Objdump can declare instruction "(bad)", it can detect that it's "(bad)" in the middle of instruction or can just confuse it for different, real instruction!
Suppose for some reason DFA accepts completely meaningless sequence of bytes, but objdump incorrectly decodes it as innocent instruction, which text-based specification allows. This situation is undesirable. That’s why we use our own RDFA decoder (which is tailored to mimic objdump output) instead of objdump itself.
RDFA decoder is designed to never accept invalid instructions (it always produces the same ‘unrecognized instruction’ message where objdump might try some guesswork). In order for decoding problem to go unnoticed by exhaustive decoder test, RDFA decoder should accept invalid sequence, objdump should accept the same invalid sequence, and they both should produce identically incorrect output. So using RDFA decoder in exhaustive validator test makes it extremely unlikely that text-based specification would have to deal with incorrectly decoded instruction.
The script resides in validator_ragel/PRESUBMIT.py and performs two checks: