Java Cards

Checking and Signing XML Documents on Java Smart Cards- Challenges and Opportunities

2020-12-07 18:18:40 M&W SmartCard 11


Challenges and Opportunities

Nils Gruschka, Florian Reuter and Norbert Luttenberger

Christian-Albrechts-University of Kiel


  • One major challenge for digitally signing a document is the so called “what you see is what you sign” problem. XML as a meta language for encoding semistructured data offers new opportunities for a solution. The possibility for checking fundamental properties of XML-encoded documents (well-formedness, validity) can be used to improve the security of the signing process for such documents. In this paper we present an architecture for checking and signing XML documents on a smart card in order to enhance the control over the documents to be signed. The proposed architecture has successfully been used to implement a secure, smart card based electronic banking application for the financial transactions system FinTS.


Smart card assistance for generating digital signatures  is current state of the art and best practice. This is mainly due to the fact that smart cards nowadays have enough processing power to produce digital signatures for documents by on-card resources (processor and memory) only. This way the owner’s private signing key never has to leave the smart card: The signing key is and remains permanently stored in a tamper-proof environment.

A closer look at the signing process however reveals a still existing major security problem: the problem known as the “what you see is what you sign” problem. Before signing a document the signer usually wants to check the document’s syntactic and semantic correctness.

When compared to the traditional process of signing a paper document with a handwritten signature, the difference can easily be identified: In the traditional case, it is relatively easy for the user to assert the correctness, be-cause syntactic and semantic document checking and signature generation are in immediate context. Digitally signing an electronic document is com-pletely different, because checking and signature generation are executed in two different fundamentally environments, exposing different characteris-tics—different with respect to security on the one hand and processor, mem-ory, and display resources on the other hand.

Traditionally, the signing application computes the document’s digest us-ing a one-way hash function and sends the result to the smart card. The card encrypts the digest by an asymmetric cipher using the signing key stored on the card. The resulting value is the digital signature of the document. But, what is really signed is beyond the user’s control. It might—for instance—be the digest for a manipulated document. Even if the smart card can be re-garded as tamper-proof, the terminal (e.g. a PC) and the programs running on it are vulnerable to viruses and Trojan horses    . Such evil-doers might obviously also affect signing applications and let them produce valid signatures for—from the user’s perspective—invalid documents. Such incidents invalidate the signing process in total. To solve this problem the whole or at least the major part of the signing process has be executed in a trusted environment. While trusted PCs are not yet available, one could use a signing application running on a trusted card reader (e.g. FINREAD ) or a smart card, which is our approach.

We propose an enhanced architecture which performs checking and sign-ing of XML documents on Java smart cards, called JXCS architecture. The basic idea of JXCS is to shift the syntactic validation and hash value genera-tion from the vulnerable PC to the trusted smart card. Syntactic validation imposes the following challenges and opportunities: Challenging is the need of processing XML documents on resource constraint Java smart cards. The opportunity of the approach is the possibility to perform syntactic and even semantic checks on the XML document in a tamper-proof environment which improves the security of the signing process.

We propose the need for three major checks on the XML documents to be signed: Well-formedness, validity and content acknowledgement using a class 3 card reader (i.e. a card reader including a display and a keypad). Taken together all three checks can defeat “what you see is what you sign” attacks.

The paper is organized as follows: Section 2 introduces relevant XML fundamentals. The checks are explained more detailed in chapter 3. Section 4 then proposes the JXCS architecture. Section 5 shows a sample implemen-tation for this architecture. Finally section 6 gives the conclusion.


2.1  XML well-formedness

An XML  document encodes semi structured data . It is said to be well-formed if it contains at least one element and fulfills the well-formedness constraints given in the XML specification. Well-formedness is related to the syntactic format of markup, content, comments, document type definition, and so on, and it also ensures the usage of proper Unicode charac-ters and the specification of their encoding. According to the XML specifica-tion every XML processing entity has to check and assert the well-formedness property.

The most important constraint is the logical structure of element tags. The tags must be properly braced, meaning that every start element has a corresponding end element, together forming a unique tree structure. For example <a><b>Hello</b></a> is well-formed, while <a>Goodand</c> and <a><b>Bye</a></b> are not well-formed.

Mingwah Aohan Smart Card

Figure 1: Possible interpretations for the not well-formed XML document:<ul><li>1<li>2</ul>

Well-formedness ensures the uniqueness of the XML documents’ inter-pretation. Consider for example the following not well-formed XML fragment:<ul><li>1<li>2</ul>. The document allows the following two interpretations shown in figure 1:<ul><li>1</li></li></ul>for a) and <ul><li>1<li>2</li></li></ul>for b). In case of well-formedness the interpretation would be unique.

2.2  XML validity

A more restricting property compared to well-formedness of XML documents is its validity. An XML document is said to be valid if it is valid with respect to an associated document type declaration or schema  . Document type definitions resp. schemata express tree grammars, against which the XML documents are checked.

A (regular) tree grammar is a 5-tupleMingwah Aohan Smart Card where,

  • Mingwah Aohan Smart Card withMingwah Aohan Smart Card is a finite set of element types andMingwah Aohan Smart Card is a finite set of attribute types,

  • D is a finite set of data types,

  • N is a finite set of non-terminals,

  • P is a finite set  of production rules  of the form nMingwah Aohan Smart Card a(r), where Mingwah Aohan Smart Card is a non-terminal,Mingwah Aohan Smart Card is a symbol and r is either an element of D or a regular expression over the alphabet N withMingwah Aohan Smart Card Mingwah Aohan Smart Card Finally

  • * is the starting non-terminal

Consider for example the following schema fragment

Mingwah Aohan Smart Card

defining instances of the form:

Mingwah Aohan Smart Card

The corresponding tree grammar for the above schema fragment has the following form:

Mingwah Aohan Smart Card withMingwah Aohan Smart Card {Amount, Value, Currency}, D = {xsd:string, xsd:decimal},Mingwah Aohan Smart Card and P containing the following production rules:

Mingwah Aohan Smart Card

An import property of tree grammars is single-typeness, which means that for each production rule nMingwah Aohan Smart Card a(r), non-terminals in its content model do not compete with each other. Single-typeness ensures unique parsing. Thus, in the following all tree grammars are assumed to be single-typed .

W3C XML Schema offers a rich variety of built-in data types which can be customized using,ands. This allows to impose content-related constraints, e.g. the restriction of decimals to an upper bound:

Mingwah Aohan Smart Card

2.3   XML Signature

XML Signature is a W3C recommendation  for digital signatures us-ing an XML format. It specifies the required XML syntax and the processing rules for creating and representing XML signatures. In the context of the JXCS architecture we used this format and its related processing rules.

Mingwah Aohan Smart Card

Figure 2: XML Signature format (simplified)

An XML Signature is represented by a signature element which has a structure as given in figure 2. A signature element carries a sequence of three elements: a Signedlnfo element, a signatureValue element, and a KeyInfo element.

The Signedlnfo element contains a sequence of a Canonicaliza-tionMethod element, a signatureMethodelement and one or more Ref-erenceparticles. A Reference particle is created for each signed object (XML document or arbitrary other content, the latter not regarded in this paper). The Reference particle includes a reference URI to the data object, which may be found externally or included in the same document. The Ref-erenceelement further contains the digest value for the object (Digest-Value), and the digest algorithm used (DigestMethod). For all signed data objects given in the Reference particles, a single signature algorithm element (signatureMethod) and a canonicalization method element (Can-onicalizationMethod) is available. (Canonicalization e.g. removes re-dundant whitespaces, sorts attributes, normalizes namespaces, etc.)

The signature value given in the SignatureValue element refers to the whole signedlnfo element using the signature algorithm and the user’s signing key. The user’s signing key may be included in the XML Signature in the Keylnfo element.

A signature according to the XML Signature recommendation is com-puted in five steps:

  1. canonicalization of the document to be signed,

  2. computing the digest of the canonicalized document,

  3. generation of the signedlnfo element,

  4. computation of the signature value,

  5. building of the signature element.

2.4  APIs for XML processing

There exist two different types of APIs for XML processing : tree-based APIs like DOM or event-based APIs like SAX  or StAX . Treebased APIs in fact load the complete XML document into main mem-ory which then can be accessed as an XML Infoset  like tree. Such kind of APIs put a great strain on system resources, especially if the document is large.

Event-based APIs report parsing events directly to the application through callbacks, and do not usually build internal trees. Applications im-plement handlers to deal with the different events, much like handling events in a graphical user interface.

With respect to the limited resources of a smart card the only useful choice for on card XML processing is an event-based API. We propose a SAX-like event-based API which produces the following events: begin-element(a), end-element(a), begin-attribute(a), end-attribute(a), charcontent(c) whereMingwah Aohan Smart Card The above XML document is represented by the following events:

  • begin-element(Amount)

  • begin-element(Value)

  • char-content(“100”)

  • end-element(Value)

  • begin-element(Currency)

  • char-content(“EUR”)

  • end-element(Currency)

  • end-element( Amount)


Why relying on XML for solving the “what you see is what you sign” problem? Our ideas can be summarized in two points:

  1. If a document to be signed is either not well-formed in the sense of XML, or not valid in the sense of its accompanying schema, or both, than it must strictly be assumed that the document has been manipulated. In consequence, it has to be dropped, and the user has to be notified.

  2. A smart card application can extract certain content items for dis-play on the smart card reader from a structured and formally de-scribed document. The extraction and display operations are fully controlled by the tamper-proof smart card—which is the same environment that generates the digital signature.

The fundamental property of XML documents is well-formedness. Ac-cording to the XML specification every XML processing entity has to check and assert this property. Regarding digital signing well-formedness is impor-tant, since it ensures the uniqueness of the XML documents’ interpretation. Well-formedness also ensures the usage of proper Unicode characters and the specification of their encoding. This is also very important regarding digital signatures, since character set manipulation can be used to perform “what you see is what you sign” attacks .

Validity is a much more restrictive property of XML documents com-pared to well-formedness. A smart card which checks validity of XML documents with respect to a given schema before signing ensures due to the tamper resistance of the smart card that only certain types of XML docu-ments are signed. Consider for example a smart card which contains your private key, but only signs XML documents which are valid with respect to a purchase order schema. You could give this card to your secretary being sure, that nothing else than purchase order is signed using your signature. Using additional constraints in the schema, e.g. the restriction of the maxi-mum amount to 100 Euro eliminates the last chance of misusage.

When operated in a card reader containing a display the card can print se-lected parts of the content and request user confirmation. This brings a great approach to the solution of the “what you see is what you sign” problem.

Obviously, XML processing is not an easy task to perform on resource-constrained smart cards. The following table summarizes the challenging XML properties and the resulting opportunities for improving the signing process:

Mingwah Aohan Smart Card


Mingwah Aohan Smart Card

Figure 3: JXCS Architecture Overview

The JXCS architecture offloads all security-critical processing tasks to the tamper-proof smart card environment including:

  • well-formedness and validity checking,

  • document canonicalization and hashing, and

  • signature value computation.

Figure 3 gives an overview over the JXCS architecture. The terminal runs an XML parser which analyzes the document to be signed. The parsing events are coded into APDUs and sent to the smart card. The events are for-warded to a chain of event handlers. These handlers process the total XML document sequentially event by event. Different handlers can be used in this chain according to the specific requirements (see for example figure 3, a) to c)). The chain normally contains event handlers for checking XML docu-ment properties like well-formedness or validity, and of course a handler for creating the signature. If an event causes one of the checks to fail, instantly an exception is thrown. If all checks on all events are successful, the signa-ture value of the document is finally returned. From this value the XML Sig-nature is created by the terminal computer. The following paragraphs de-scribe the JXCS architecture handlers in detail.

4.1   Signing Handler

The signing handler’s purpose is to digest the canonicalized document, to create the signedInfo element and to compute the signatureValue.

For the purpose of digesting the signing handler reconstructs the canoni-calized document from the received parsing event; there is no need to send the original document to the smart card. This approach—based on a single event stream—enables the efficient combination of both checking and sign-ing XML documents on resource constraint smart cards.

Mingwah Aohan Smart Card

Table 2 shows how the canonicalized document is reconstructed from parsing events.

For each reconstructed document element the hash generator of the sign-ing handler is triggered which that way digests the whole document piece by piece. Most hashing functions (like MD5  or SHA-1 ) do not store the whole input data, but process the input stream block by block and only need a small additional digest buffer. Thus the hashing “piece by piece” is very space saving.

If the document has been completely parsed and transmitted to the card, the endDocument event is triggered and the signing object starts the actual signing process. The signer creates the signedInfo (see above) element including the signature, canonicalization and digest algorithm. Furthermore the document’s digest and the reference (created from the document’s root element) is inserted.

The resulting SignedInfo fragment is signed using the user’s private key, stored on the smart card. The resulting value is used to create the entire XML Signature.

As parsing events and the canonicalized document are semantically equivalent, the canonicalized document can be reconstructed from the pars-ing events, and the digital signature can be generated from the reconstructed document, and the signature is valid only for the parsed document and not for any other document.

4.2    Well-Formedness Checker

The following algorithm is used in the JXCS architecture to check wellformedness:

begin-element(a): Push a on the stack

end-element(a):Let a’ be the stack’s topmost element. 

  • If a = a’, pop a’ from stack.

  • Otherwise throw a not-wellformed exception,

begin-attribute(a):do nothing

end-attribute(a): do nothing

char-content(c): check if all characters in c are allowed characters

4.3  Validity Checker

The validity of XML documents with respect to a tree grammar (i.e. a document type definition or a schema) can easily be checked by the follow-ing algorithm using the proposed event-based API and a stack. The algo-rithm starts with an empty stack.


  • If the stack is empty, find a production ruleMingwah Aohan Smart Card and push it onto the stack.

  • Otherwise let n’ Mingwah Aohan Smart Card a’(r’) be the stack’s topmost production rule.Search the production rule nMingwah Aohan Smart Card a(r) whereMingwah Aohan Smart Card

  •     If none such exist, throw a not-valid exception.

  •     Otherwise change the actual production n’Mingwah Aohan Smart Card a’(r’) ton’Mingwah Aohan Smart Carda’(n’\r’) and then push nMingwah Aohan Smart Carda(r) onto the stack.

end-element(a):Let n’Mingwah Aohan Smart Card a’(r’) be the stack’s topmost production rule.

  • IfMingwah Aohan Smart Card throw a not-wellformed exception.

  • Otherwise

        IfMingwah Aohan Smart Card throw a not-valid exception.

        Otherwise pop the production rule from the stack.

begin-attribute(a): same as for begin-element

end-attribute(a): same as for end-element

char-content(c):Let nMingwah Aohan Smart Card a(r) be the stack’s topmost production rule.

  • IfMingwah Aohan Smart Card throw a not-valid exception.

  • IfMingwah Aohan Smart Card throw a not-valid exception.

The setMingwah Aohan Smart Card there exists aMingwah Aohan Smart Card withMingwah Aohan Smart Card describes the set of letters accepted next and x\a describes the derivative of x with re-spect to a, i.e. the expression which remains from a after parsing x. Both next(r) and x\a can be computed efficiently using a simple lookup table . The above algorithm is correct, due to the fact that for a single type tree grammar the next non-terminal can be chosen uniquely.

We use a compressed encoding of the production rules based on adja-cency lists. Here the running time is O(m n) with m is the length of the larg-est adjacency list. For a fixed tree grammar m is constant, so the running time is still O(n) for a fixed schema or document type definition.

The space consumption is O(h) where h is the height of the parsed tree. For non-recursive tree grammars the maximum height of an XML document is fixed, so the space consumption is limited for a fixed tree grammar. There-fore the memory consumption of the algorithm will not exceed the smart card’s resources.

The adjacency list representation for an XML schema is generated off-card and installed on the card. A card can handle one or more schemata.

4.4   Visual inspector for Class 3 Reader

A smart card reader with own display and numerical keyboard (often called “class 3 reader”) offers further possibilities for checking the document before signing it. The data exchanged between card reader and smart card can neither be read nor changed by the terminal or any Trojan horse running on it. So unless the reader’s firmware has not been modified, the card reader can be used as a secure display for the information send to the card.

A simple solution to the “what you see is what you sign” problem would be showing the complete document on the display prior to sending it to the smart card. This would indeed be useless for at most all practical purposes. These displays normally only have 1 to 3 rows with at most 20 characters each. Most users’ acceptance for viewing a complete document on such a display would be very low.

Displaying selected parts of arbitrary XML documents otherwise is inef-fective for checking the document to be signed. A single element may have totally different semantics in different contexts. If the user acknowledges e.g. the prompt

Mingwah Aohan Smart Card

he does not know the denotation of this element. He could buy a very cheap car or loose a lot of money by transferring it to the Nigeria connection. Even if all ancestors of the element are displayed additionally, the semantic is generally not unique.

This is solved by validating the XML document to a specific schema. In this case the context and semantic of an element is unambiguous. If the vali-dator validates the documents e.g. against a schema for cash remittance, the user can be sure to sign a transfer of 1000 Euro, if he acknowledges the ele-ment shown above.

The functionality of the display component is the following. When the schema is transferred to the smart card, the most “critical” elements are marked for displaying. The display component reads the content of these elements from the event stream. These values have to be acknowledged by the user. If one is rejected an exception is thrown.


5.1   Sample Application: FinTS

FinTS  (formerly named HBCI) is a home banking computer inter-face developed by the German Central Banking Committee (ZKA). It de-fines a standard for home banking and specifies the relationship between customer products and bank systems. FinTS allows more flexible and convenient online banking than other systems. To ensure secure transactions over open networks, cryptographic functions and smart cards are used.

The actual version’s communication is based completely on XML. All messages exchanged between the FinTS server and the FinTS clients are XML documents. Orders, like cash remittances, are signed using a personal smart card and coded as XML signature. The whole transaction message – containing one or more orders – is encrypted into an XML encryption docu-ment.

Banking transactions are obviously extreme secure relevant and a profit-able target for attacks. The user wants assurance, that the document he cre-ates using his online banking program is actually the one signed by the card and sent to his bank.

The assumed scenario for our sample application is an online-banking user who wants to sign cash remittances using a smart card and a class 3 reader. The signed remittances will then be sent to his bank’s FinTS server.

5.2  Implementation

The proposed architecture was utilized to implement a client with a se-cure signing process for FinTS banking transactions. We implemented a cli-ent program on the terminal creating FinTS remittance transactions and sending it to the smart card. The smart card contains the user’s private key and a signing component. It also contains a validator, validating the input document against a modified FinTS schema for a remittance. The schema has been altered in the way that the remittance value must be less or equal 100 and the currency must be EUR. Thus the smart card will sign only documents being a valid FinTS remittance with a maximum of 100 Euro. Thus the user has the assurance, that his signing card will never sign any-thing else, no matter what a Trojan horse would perform to the documents. Additionally the display component let the user acknowledge the remit-tance’s most important content values like target account number. Thus even modifications on these values would be detected.

The sample client program is a GUI that creates a FinTS cash remittance document from the input values. This XML document is parsed into XML events, which are coded as TLV into request APDUs. Due to efficiency rea-sons not single events are sent to the card. Instead the amount of events fit-ting into an APDU is transferred, improving performance by up to 20%.

On the Java smart card a main applet and the following event handlers are implemented: validator, displayer and signer. The XML events are de-coded from the APDUs by the applet, which calls the event handler’s appro-priate event methods. Once an exception is thrown by one of the handlers, an error code is returned to the client. In order to simplify the implementation the algorithms for hashing, signing, canonicalization are set statically. The final application needs approximately 13000 bytes plus the binary schema representation. The signing time increases linear with the document’s size. For a typical cash remittance (ca. 1.2 KB) approximately 15 seconds are needed.

The validator is an implementation of the above described validity check-ing algorithm. In our sample application we provided the validator on the smart card with the FinTS schema. The original schema has 5537 bytes, the binary tree grammar only just 954 bytes. As shown above the stack’s size is limited for a fixed schema. The stack can be implemented as list of fixed length. Thus dynamic object instantiating can be avoided, which is critical for a Java card without garbage collection. For the FinTS schema the maxi-mum size needed for the validator’s stack is 200 bytes.

The display component will be configured when uploading the schema. The simple type elements, that shall be displayed, are specified by an XPath  like expression. The displayer collects the content for these elements from the charContent events. He then waits for the acknowledgement for these values from the card reader. Due to the stringent master-slave-relationship between the terminal and the smart card, the card can not re-quest the card reader for displaying these values. Instead, the PC sends the to-be-acknowledged values to the card reader contained in a special com-mand. This command instructs the reader to display the value and send it to the card if it is acknowledged by the user. The displayer compares these val-ues with the content extracted from the events. If this fails an exception is thrown. Figure 4 shows a sample output on the card reader’s display.

Mingwah Aohan Smart Card

Figure 4: Card reader waiting for user acknowledgement

As pointed out above the signing component creates the signature fragment from the events. In order to minimize the communication between PC and card, not the total Signature fragment is returned to the client but only the documents digest and the signature of the signedInfo fragment. As the algorithms are fixed, the client program can create the same signature fragment from the digest and the signature values. And of course this is no lack of security.

The Java smart cards used are JCOP 2lid from IBM. These cards are compliant to JavaCard 2.1.1‚ OpenPlatform 2.0.1’ and FIPS 140-2 level 3. They have 30 Kbytes EEPROM as persistent Java heap‚ 590 bytes RAM as transient Java heap and 200 bytes RAM as Java stack. The card reader used is a class 3 reader (2 x 16 character display‚ numerical keypad) from Rein-erSCT.


In this article we have shown how processing XML documents on a smart card arises new opportunities for signing them. By checking properties like well-formedness and validity the user gains more control over the documents signed using his private key. We have also shown‚ that checking a document’s validity according to a specific grammar‚ allows showing sin-gle elements from the document on a reader’s display without losing the element’s semantic. This way we can greatly improve the security of the signing process for XML document and even approach a solution for the “what you see is what you sign” problem.

Future research will focus on applying the XML processing smart card technology on XML communication protocols to improve e.g. Web Service security.


We would like to thank Karsten Strunk and Jesper Zedlitz for implement-ing the secure FinTS signing client including an interface to a FinTS server system. We also like to thank Christian Friberg and Rainer Segebrecht of PPI Financial Services Kiel for supporting the implementation process.

Java Card is an open standard from Sun Microsystems for a smart card developmentplatform. Smart cards created using the Java Card platform have Java applets stored on them. The applets can be added to or changed after the card is issued.

There are two basic types of smart cards. The memory smart card is the familiar removable memory device; it usually features read and write capabilities and perhaps security features. The more complex version, the processor smart card, is a very small and extremely portable computing device that could be carried in your wallet. Java-based smart cards belong to the latter category. They store data on an integrated microprocessor chip. Applets are loaded into the memory of the microprocessor and run by the Java Virtual Machine. Similarly to MULTOS, another smart card development technology, Java Card enables multiple application programs to be installed and coexist independently. Individual applets are protected by a firewall to preserve their integrity and prevent tampering. Applications can be updated dynamically.

In the United States, the Department of Defense, Visa, and American Express are among the organizations creating Java Card-based applications.

Contact us