Dr. Stephen Henson | 9d6b1ce | 2000-12-08 19:09:35 +0000 | [diff] [blame] | 1 | |
| 2 | OpenSSL ASN1 Revision |
| 3 | ===================== |
| 4 | |
| 5 | This document describes some of the issues relating to the new ASN1 code. |
| 6 | |
| 7 | Previous OpenSSL ASN1 problems |
| 8 | ============================= |
| 9 | |
| 10 | OK why did the OpenSSL ASN1 code need revising in the first place? Well |
| 11 | there are lots of reasons some of which are included below... |
| 12 | |
| 13 | 1. The code is difficult to read and write. For every single ASN1 structure |
| 14 | (e.g. SEQUENCE) four functions need to be written for new, free, encode and |
| 15 | decode operations. This is a very painful and error prone operation. Very few |
| 16 | people have ever written any OpenSSL ASN1 and those that have usually wish |
| 17 | they hadn't. |
| 18 | |
| 19 | 2. Partly because of 1. the code is bloated and takes up a disproportionate |
| 20 | amount of space. The SEQUENCE encoder is particularly bad: it essentially |
| 21 | contains two copies of the same operation, one to compute the SEQUENCE length |
| 22 | and the other to encode it. |
| 23 | |
| 24 | 3. The code is memory based: that is it expects to be able to read the whole |
| 25 | structure from memory. This is fine for small structures but if you have a |
| 26 | (say) 1Gb PKCS#7 signedData structure it isn't such a good idea... |
| 27 | |
| 28 | 4. The code for the ASN1 IMPLICIT tag is evil. It is handled by temporarily |
| 29 | changing the tag to the expected one, attempting to read it, then changing it |
| 30 | back again. This means that decode buffers have to be writable even though they |
| 31 | are ultimately unchanged. This gets in the way of constification. |
| 32 | |
| 33 | 5. The handling of EXPLICIT isn't much better. It adds a chunk of code into |
| 34 | the decoder and encoder for every EXPLICIT tag. |
| 35 | |
| 36 | 6. APPLICATION and PRIVATE tags aren't even supported at all. |
| 37 | |
| 38 | 7. Even IMPLICIT isn't complete: there is no support for implicitly tagged |
| 39 | types that are not OPTIONAL. |
| 40 | |
| 41 | 8. Much of the code assumes that a tag will fit in a single octet. This is |
| 42 | only true if the tag is 30 or less (mercifully tags over 30 are rare). |
| 43 | |
| 44 | 9. The ASN1 CHOICE type has to be largely handled manually, there aren't any |
| 45 | macros that properly support it. |
| 46 | |
| 47 | 10. Encoders have no concept of OPTIONAL and have no error checking. If the |
| 48 | passed structure contains a NULL in a mandatory field it will not be encoded, |
| 49 | resulting in an invalid structure. |
| 50 | |
| 51 | 11. It is tricky to add ASN1 encoders and decoders to external applications. |
| 52 | |
| 53 | Template model |
| 54 | ============== |
| 55 | |
| 56 | One of the major problems with revision is the sheer volume of the ASN1 code. |
| 57 | Attempts to change (for example) the IMPLICIT behaviour would result in a |
| 58 | modification of *every* single decode function. |
| 59 | |
| 60 | I decided to adopt a template based approach. I'm using the term 'template' |
| 61 | in a manner similar to SNACC templates: it has nothing to do with C++ |
| 62 | templates. |
| 63 | |
| 64 | A template is a description of an ASN1 module as several constant C structures. |
| 65 | It describes in a machine readable way exactly how the ASN1 structure should |
| 66 | behave. If this template contains enough detail then it is possible to write |
| 67 | versions of new, free, encode, decode (and possibly others operations) that |
| 68 | operate on templates. |
| 69 | |
| 70 | Instead of having to write code to handle each operation only a single |
| 71 | template needs to be written. If new operations are needed (such as a 'print' |
| 72 | operation) only a single new template based function needs to be written |
| 73 | which will then automatically handle all existing templates. |
| 74 | |
| 75 | Plans for revision |
| 76 | ================== |
| 77 | |
| 78 | The revision will consist of the following steps. Other than the first two |
| 79 | these can be handled in any order. |
| 80 | |
| 81 | o Design and write template new, free, encode and decode operations, initially |
| 82 | memory based. *DONE* |
| 83 | |
| 84 | o Convert existing ASN1 code to template form. *IN PROGRESS* |
| 85 | |
| 86 | o Convert an existing ASN1 compiler (probably SNACC) to output templates |
| 87 | in OpenSSL form. |
| 88 | |
| 89 | o Add support for BIO based ASN1 encoders and decoders to handle large |
| 90 | structures, initially blocking I/O. |
| 91 | |
| 92 | o Add support for non blocking I/O: this is quite a bit harder than blocking |
| 93 | I/O. |
| 94 | |
| 95 | o Add new ASN1 structures, such as OCSP, CRMF, S/MIME v3 (CMS), attribute |
| 96 | certificates etc etc. |
| 97 | |
| 98 | Description of major changes |
| 99 | ============================ |
| 100 | |
| 101 | The BOOLEAN type now takes three values. 0xff is TRUE, 0 is FALSE and -1 is |
| 102 | absent. The meaning of absent depends on the context. If for example the |
| 103 | boolean type is DEFAULT FALSE (as in the case of the critical flag for |
| 104 | certificate extensions) then -1 is FALSE, if DEFAULT TRUE then -1 is TRUE. |
| 105 | Usually the value will only ever be read via an API which will hide this from |
| 106 | an application. |
| 107 | |
| 108 | There is an evil bug in the old ASN1 code that mishandles OPTIONAL with |
| 109 | SEQUENCE OF or SET OF. These are both implemented as a STACK structure. The |
| 110 | old code would omit the structure if the STACK was NULL (which is fine) or if |
| 111 | it had zero elements (which is NOT OK). This causes problems because an empty |
| 112 | SEQUENCE OF or SET OF will result in an empty STACK when it is decoded but when |
| 113 | it is encoded it will be omitted resulting in different encodings. The new code |
| 114 | only omits the encoding if the STACK is NULL, if it contains zero elements it |
| 115 | is encoded and empty. There is an additional problem though: because an empty |
| 116 | STACK was omitted, sometimes the corresponding *_new() function would |
| 117 | initialize the STACK to empty so an application could immediately use it, if |
| 118 | this is done with the new code (i.e. a NULL) it wont work. Therefore a new |
| 119 | STACK should be allocated first. One instance of this is the X509_CRL list of |
| 120 | revoked certificates: a helper function X509_CRL_add0_revoked() has been added |
| 121 | for this purpose. |
| 122 | |
| 123 | The X509_ATTRIBUTE structure used to have an element called 'set' which took |
| 124 | the value 1 if the attribute value was a SET OF or 0 if it was a single. Due |
| 125 | to the behaviour of CHOICE in the new code this has been changed to a field |
| 126 | called 'single' which is 0 for a SET OF and 1 for single. The old field has |
| 127 | been deleted to deliberately break source compatibility. Since this structure |
| 128 | is normally accessed via higher level functions this shouldn't break too much. |
| 129 | |
| 130 | The X509_REQ_INFO certificate request info structure no longer has a field |
| 131 | called 'req_kludge'. This used to be set to 1 if the attributes field was |
| 132 | (incorrectly) omitted. You can check to see if the field is omitted now by |
| 133 | checking if the attributes field is NULL. Similarly if you need to omit |
| 134 | the field then free attributes and set it to NULL. |
| 135 | |
| 136 | The top level 'detached' field in the PKCS7 structure is no longer set when |
| 137 | a PKCS#7 structure is read in. PKCS7_is_detached() should be called instead. |
| 138 | The behaviour of PKCS7_get_detached() is unaffected. |
| 139 | |
| 140 | The values of 'type' in the GENERAL_NAME structure have changed. This is |
| 141 | because the old code use the ASN1 initial octet as the selector. The new |
| 142 | code uses the index in the ASN1_CHOICE template. |
| 143 | |
| 144 | The DIST_POINT_NAME structure has changed to be a true CHOICE type. |
| 145 | |
| 146 | typedef struct DIST_POINT_NAME_st { |
| 147 | int type; |
| 148 | union { |
| 149 | STACK_OF(GENERAL_NAME) *fullname; |
| 150 | STACK_OF(X509_NAME_ENTRY) *relativename; |
| 151 | } name; |
| 152 | } DIST_POINT_NAME; |
| 153 | |
| 154 | This means that name.fullname or name.relativename should be set |
| 155 | and type reflects the option. That is if name.fullname is set then |
| 156 | type is 0 and if name.relativename is set type is 1. |
| 157 | |
| 158 | With the old code using the i2d functions would typically involve: |
| 159 | |
| 160 | unsigned char *buf, *p; |
| 161 | int len; |
| 162 | /* Find length of encoding */ |
| 163 | len = i2d_SOMETHING(x, NULL); |
| 164 | /* Allocate buffer */ |
| 165 | buf = OPENSSL_malloc(len); |
| 166 | if(buf == NULL) { |
| 167 | /* Malloc error */ |
| 168 | } |
| 169 | /* Use temp variable because &p gets updated to point to end of |
| 170 | * encoding. |
| 171 | */ |
| 172 | p = buf; |
| 173 | i2d_SOMETHING(x, &p); |
| 174 | |
| 175 | |
| 176 | Using the new i2d you can also do: |
| 177 | |
| 178 | unsigned char *buf = NULL; |
| 179 | int len; |
| 180 | len = i2d_SOMETHING(x, &buf); |
| 181 | if(len < 0) { |
| 182 | /* Malloc error */ |
| 183 | } |
| 184 | |
| 185 | and it will automatically allocate and populate a buffer with the |
| 186 | encoding. After this call 'buf' will point to the start of the |
| 187 | encoding which is len bytes long. |