class Hermes::Entities

Translate HTML and XML character entities: "&" to "&" and vice versa.

What actually happens

HTML pages usually come in with characters encoded &lt; for < and &euro; for .

Further, they may contain a meta tag in the header like this:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta charset="utf-8" />                        (HTML5)

or

<?xml version="1.0" encoding="UTF-8" ?>         (XHTML)

When charset is utf-8 and the file contains the byte sequence "\303\244"/"\xc3\xa4" then there will be displayed a character "ä".

When charset is iso8859-15 and the file contains the byte sequence "\344"/"\xe4" then there will be displayed a character "ä", too.

The sequence "&auml;" will produce an "ä" in any case.

What you should do

Generating your own HTML pages you will always be safe when you only produce entity tags as &auml; and &euro; or &#x00e4; and &#x20ac; respectively.

What this module does

This module translates strings to a HTML-masked version. The encoding will not be changed and you may demand to keep 8-bit-characters.

Examples

Entities.encode "<"                           #=> "&lt;"
Entities.decode "&lt;"                        #=> "<"
Entities.encode "äöü"                         #=> "&auml;&ouml;&uuml;"
Entities.decode "&auml;&ouml;&uuml;"          #=> "äöü"

Attributes

keep_8bit[RW]

Public Class Methods

new( keep_8bit = nil) → ent click to toggle source
new( :keep_8bit => val) → ent

Creates an Entities converter.

The parameter may be given as one value or as a hash.

ent = Entities.new true
ent = Entities.new :keep_8bit => true
# File lib/hermes/escape.rb, line 131
def initialize keep_8bit = nil
  @keep_8bit = case keep_8bit
    when Hash then keep_8bit[ :keep_8bit]
    else           keep_8bit
  end
end

Public Instance Methods

decode(str) click to toggle source
# File lib/hermes/escape.rb, line 172
def decode str
  self.class.decode str
end
encode( str) → str click to toggle source

Create a string thats characters are masked the HTML style:

ent = Entities.new
ent.encode "&<\""    #=> "&amp;&lt;&quot;"
ent.encode "äöü"     #=> "&auml;&ouml;&uuml;"

The result will be in the same encoding as the source even if it will not contain any 8-bit characters (what can only happen when keep_8bit is set).

ent = Entities.new true

uml = "<ä>".encode "UTF-8"
ent.encode uml             #=> "&lt;\xc3\xa4&gt;" in UTF-8

uml = "<ä>".encode "ISO-8859-1"
ent.encode uml             #=> "&lt;\xe4&gt;"     in ISO-8859-1
# File lib/hermes/escape.rb, line 159
def encode str
  r = str.new_string
  r.gsub! RE_ASC do |x| "&#{SPECIAL_ASC[ x]};" end
  unless @keep_8bit then
    r.gsub! /[^\0-\x7f]/ do |c|
      c.encode! ENCODING
      s = SPECIAL[ c] || ("#x%04x" % c.ord)
      "&#{s};"
    end
  end
  r
end
std() click to toggle source
# File lib/hermes/escape.rb, line 180
def std
  @std ||= new
end

Private Instance Methods

named_decode(s) click to toggle source
# File lib/hermes/escape.rb, line 217
def named_decode s
  c = NAMES[ s]
  if c then
    if c.encoding != s.encoding then
      c.encode s.encoding
    else
      c
    end
  end
end
numeric_decode(s) click to toggle source
# File lib/hermes/escape.rb, line 228
def numeric_decode s
  if s =~ /\A#(?:(\d+)|x([0-9a-f]+))\z/i then
    c = ($1 ? $1.to_i : ($2.to_i 0x10)).chr ENCODING
    c.encode! s.encoding
  end
end