java.lang.Object
org.apache.jena.ext.xerces.util.XercesXMLChar
This class defines the basic XML character properties. The data
in this class can be used to verify that a character is a valid
XML character or if the character is a space, name start, or name
character.
A series of convenience methods are supplied to ease the burden
of the developer. Because inlining the checks can improve per
character performance, the tables of character properties are
public. Using the character as an index into the CHARS
array and applying the appropriate mask flag (e.g.
MASK_VALID
), yields the same results as calling the
convenience methods. There is one exception: check the comments
for the isValid
method for details.
- Version:
- $Id: XMLChar.java 674378 2008-07-07 00:52:45Z mrglavas $
- Author:
- Glenn Marcy, IBM, Andy Clark, IBM, Eric Ye, IBM, Arnaud Le Hors, IBM, Michael Glavassevich, IBM, Rahul Srivastava, Sun Microsystems Inc.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Content character mask.static final int
Name character mask.static final int
Name start character mask.static final int
NCName character mask.static final int
NCName start character mask.static final int
Pubid character mask.static final int
Space character mask.static final int
Valid character mask. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic char
highSurrogate
(int c) Returns the high surrogate of a supplemental characterstatic boolean
isContent
(int c) Returns true if the specified character can be considered content.static boolean
isHighSurrogate
(int c) Returns whether the given character is a high surrogatestatic boolean
isInvalid
(int c) Returns true if the specified character is invalid.static boolean
isLowSurrogate
(int c) Returns whether the given character is a low surrogatestatic boolean
isMarkup
(int c) Returns true if the specified character can be considered markup.static boolean
isName
(int c) Returns true if the specified character is a valid name character as defined by production [4] in the XML 1.0 specification.static boolean
isNameStart
(int c) Returns true if the specified character is a valid name start character as defined by production [5] in the XML 1.0 specification.static boolean
isNCName
(int c) Returns true if the specified character is a valid NCName character as defined by production [5] in Namespaces in XML recommendation.static boolean
isNCNameStart
(int c) Returns true if the specified character is a valid NCName start character as defined by production [4] in Namespaces in XML recommendation.static boolean
isPubid
(int c) Returns true if the specified character is a valid Pubid character as defined by production [13] in the XML 1.0 specification.static boolean
isSpace
(int c) Returns true if the specified character is a space character as defined by production [3] in the XML 1.0 specification.static boolean
isSupplemental
(int c) Returns true if the specified character is a supplemental character.static boolean
isValid
(int c) Returns true if the specified character is valid.static boolean
isValidIANAEncoding
(String ianaEncoding) Returns true if the encoding name is a valid IANA encoding.static boolean
isValidJavaEncoding
(String javaEncoding) Returns true if the encoding name is a valid Java encoding.static boolean
isValidName
(String name) Check to see if a string is a valid Name according to [5] in the XML 1.0 Recommendationstatic boolean
isValidNCName
(String ncName) Check to see if a string is a valid NCName according to [4] from the XML Namespaces 1.0 Recommendationstatic boolean
isValidNmtoken
(String nmtoken) Check to see if a string is a valid Nmtoken according to [7] in the XML 1.0 Recommendationstatic char
lowSurrogate
(int c) Returns the low surrogate of a supplemental characterstatic int
supplemental
(char h, char l) Returns true the supplemental character corresponding to the given surrogates.static String
Trims space characters as defined by production [3] in the XML 1.0 specification from both ends of the given string.
-
Field Details
-
MASK_VALID
public static final int MASK_VALIDValid character mask.- See Also:
-
MASK_SPACE
public static final int MASK_SPACESpace character mask.- See Also:
-
MASK_NAME_START
public static final int MASK_NAME_STARTName start character mask.- See Also:
-
MASK_NAME
public static final int MASK_NAMEName character mask.- See Also:
-
MASK_PUBID
public static final int MASK_PUBIDPubid character mask.- See Also:
-
MASK_CONTENT
public static final int MASK_CONTENTContent character mask. Special characters are those that can be considered the start of markup, such as '<' and '&'. The various newline characters are considered special as well. All other valid XML characters can be considered content.This is an optimization for the inner loop of character scanning.
- See Also:
-
MASK_NCNAME_START
public static final int MASK_NCNAME_STARTNCName start character mask.- See Also:
-
MASK_NCNAME
public static final int MASK_NCNAMENCName character mask.- See Also:
-
-
Constructor Details
-
XercesXMLChar
public XercesXMLChar()
-
-
Method Details
-
isSupplemental
public static boolean isSupplemental(int c) Returns true if the specified character is a supplemental character.- Parameters:
c
- The character to check.
-
supplemental
public static int supplemental(char h, char l) Returns true the supplemental character corresponding to the given surrogates.- Parameters:
h
- The high surrogate.l
- The low surrogate.
-
highSurrogate
public static char highSurrogate(int c) Returns the high surrogate of a supplemental character- Parameters:
c
- The supplemental character to "split".
-
lowSurrogate
public static char lowSurrogate(int c) Returns the low surrogate of a supplemental character- Parameters:
c
- The supplemental character to "split".
-
isHighSurrogate
public static boolean isHighSurrogate(int c) Returns whether the given character is a high surrogate- Parameters:
c
- The character to check.
-
isLowSurrogate
public static boolean isLowSurrogate(int c) Returns whether the given character is a low surrogate- Parameters:
c
- The character to check.
-
isValid
public static boolean isValid(int c) Returns true if the specified character is valid. This method also checks the surrogate character range from 0x10000 to 0x10FFFF.If the program chooses to apply the mask directly to the
CHARS
array, then they are responsible for checking the surrogate character range.- Parameters:
c
- The character to check.
-
isInvalid
public static boolean isInvalid(int c) Returns true if the specified character is invalid.- Parameters:
c
- The character to check.
-
isContent
public static boolean isContent(int c) Returns true if the specified character can be considered content.- Parameters:
c
- The character to check.
-
isMarkup
public static boolean isMarkup(int c) Returns true if the specified character can be considered markup. Markup characters include '<', '&', and '%'.- Parameters:
c
- The character to check.
-
isSpace
public static boolean isSpace(int c) Returns true if the specified character is a space character as defined by production [3] in the XML 1.0 specification.- Parameters:
c
- The character to check.
-
isNameStart
public static boolean isNameStart(int c) Returns true if the specified character is a valid name start character as defined by production [5] in the XML 1.0 specification.- Parameters:
c
- The character to check.
-
isName
public static boolean isName(int c) Returns true if the specified character is a valid name character as defined by production [4] in the XML 1.0 specification.- Parameters:
c
- The character to check.
-
isNCNameStart
public static boolean isNCNameStart(int c) Returns true if the specified character is a valid NCName start character as defined by production [4] in Namespaces in XML recommendation.- Parameters:
c
- The character to check.
-
isNCName
public static boolean isNCName(int c) Returns true if the specified character is a valid NCName character as defined by production [5] in Namespaces in XML recommendation.- Parameters:
c
- The character to check.
-
isPubid
public static boolean isPubid(int c) Returns true if the specified character is a valid Pubid character as defined by production [13] in the XML 1.0 specification.- Parameters:
c
- The character to check.
-
isValidName
Check to see if a string is a valid Name according to [5] in the XML 1.0 Recommendation- Parameters:
name
- string to check- Returns:
- true if name is a valid Name
-
isValidNCName
Check to see if a string is a valid NCName according to [4] from the XML Namespaces 1.0 Recommendation- Parameters:
ncName
- string to check- Returns:
- true if name is a valid NCName
-
isValidNmtoken
Check to see if a string is a valid Nmtoken according to [7] in the XML 1.0 Recommendation- Parameters:
nmtoken
- string to check- Returns:
- true if nmtoken is a valid Nmtoken
-
isValidIANAEncoding
Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.- Parameters:
ianaEncoding
- The IANA encoding name.
-
isValidJavaEncoding
Returns true if the encoding name is a valid Java encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an Java encoding name.- Parameters:
javaEncoding
- The Java encoding name.
-
trim
Trims space characters as defined by production [3] in the XML 1.0 specification from both ends of the given string.- Parameters:
value
- the string to be trimmed- Returns:
- the given string with the space characters trimmed from both ends
-