class Hermes::Entities
Translate HTML and XML character entities: "&"
to "&"
and vice versa.
What actually happens¶ ↑
HTML pages usually come in with characters encoded <
for <
and €
for €
.
Further, they may contain a meta tag in the header like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta charset="utf-8" /> (HTML5)
or
<?xml version="1.0" encoding="UTF-8" ?> (XHTML)
When charset
is utf-8
and the file contains the
byte sequence
"\303\244"
/"\xc3\xa4"
then
there will be displayed a character "ä"
.
When charset
is iso8859-15
and the file contains
the byte sequence
"\344"
/"\xe4"
then there will
be displayed a character "ä"
, too.
The sequence "ä"
will produce an
"ä"
in any case.
What you should do¶ ↑
Generating your own HTML pages you will always be safe when you only
produce entity tags as ä
and €
or ä
and €
respectively.
What this module does¶ ↑
This module translates strings to a HTML-masked version. The encoding will not be changed and you may demand to keep 8-bit-characters.
Examples¶ ↑
Entities.encode "<" #=> "<" Entities.decode "<" #=> "<" Entities.encode "äöü" #=> "äöü" Entities.decode "äöü" #=> "äöü"
Attributes
Public Class Methods
Creates an Entities
converter.
The parameter may be given as one value or as a hash.
ent = Entities.new true ent = Entities.new :keep_8bit => true
# File lib/hermes/escape.rb, line 131 def initialize keep_8bit = nil @keep_8bit = case keep_8bit when Hash then keep_8bit[ :keep_8bit] else keep_8bit end end
Public Instance Methods
# File lib/hermes/escape.rb, line 172 def decode str self.class.decode str end
Create a string thats characters are masked the HTML style:
ent = Entities.new ent.encode "&<\"" #=> "&<"" ent.encode "äöü" #=> "äöü"
The result will be in the same encoding as the source even if it will not
contain any 8-bit characters (what can only happen when
keep_8bit
is set).
ent = Entities.new true uml = "<ä>".encode "UTF-8" ent.encode uml #=> "<\xc3\xa4>" in UTF-8 uml = "<ä>".encode "ISO-8859-1" ent.encode uml #=> "<\xe4>" in ISO-8859-1
# File lib/hermes/escape.rb, line 159 def encode str r = str.new_string r.gsub! RE_ASC do |x| "&#{SPECIAL_ASC[ x]};" end unless @keep_8bit then r.gsub! /[^\0-\x7f]/ do |c| c.encode! ENCODING s = SPECIAL[ c] || ("#x%04x" % c.ord) "&#{s};" end end r end
# File lib/hermes/escape.rb, line 180 def std @std ||= new end
Private Instance Methods
# File lib/hermes/escape.rb, line 217 def named_decode s c = NAMES[ s] if c then if c.encoding != s.encoding then c.encode s.encoding else c end end end
# File lib/hermes/escape.rb, line 228 def numeric_decode s if s =~ /\A#(?:(\d+)|x([0-9a-f]+))\z/i then c = ($1 ? $1.to_i : ($2.to_i 0x10)).chr ENCODING c.encode! s.encoding end end