class MsCsv

When Microsoft's Excel exports a file as a CSV, there is hardly a convention that will not be violated. Fields that contain newlines will be exported in quotes and the newlines will not be escaped. They will be written just as one "\x0a". Yet, the records themselves still are separated by MS newlines, that is "\x0d\x0a".

Constants

QUOTE
SEP
VERSION

Attributes

encoding[RW]

Public Class Methods

new( file, sep = nil, quote = nil) → obj click to toggle source

The sep and quote parameters are ";" and "\"" by default. More than one character may be specified.

# File lib/mscsv.rb, line 50
def initialize file, sep = nil, quote = nil
  @file = file
  @sep, @quote = sep||SEP, quote||QUOTE
end

Public Instance Methods

each_as( recdef) { |rec,err| ... } → nil click to toggle source

Return record with fields converted to several types. In case any of the fields in a record may not be converted legally, the err variable will contain the exception and the fields will be returned as strings. At present, following types may be demanded:

s => string
S => stripped string, nil if empty
n => integer
c => currency   # class +Currency+ must respond to +parse+ (*)
d => date       # class +Date+ must respond to +strptime+ (*)
t => time       # class +Time+ must respond to +parse+ (*)
b => boolean

(*) choose the appropriate +require+ yourself.

Example:

MsCsv.open "somefile.csv" do |f|
  f.each_as "ndsscb" do |r,d|
    puts r.length.inspect + "  " + r.inspect unless d
  end
end
# File lib/mscsv.rb, line 240
def each_as recdef
  case recdef
    when String then recdef = recdef.scan /\S/
  end
  d = recdef.map { |f| FORMATS[ f] }
  each_notempty do |r|
    begin
      i = 0
      r = (d.zip r).map do |(fmt,fld)|
        i += 1
        if fld then
          f = fmt.new fld
          f.val
        end
      end
    rescue
      r = r.map { |x| x.notempty? }
      err = "#{i}: #$!"
    end
    yield r, err
  end
end
each_notempty { |f,g,...| ... } → nil click to toggle source

Same as each_record except that records containing only nil fields will be skipped.

# File lib/mscsv.rb, line 105
def each_notempty
  each_record { |r|
    yield r unless r.compact.empty?
  }
end
each_record { |ary| ... } → nil click to toggle source

Iterate through the CSV file. The fields will be returned as strings or nil. A line consists of at least one field, so an empty line will yield a one-element array containing nil.

# File lib/mscsv.rb, line 62
def each_record
  unless defined? Encoding then
    require "iconv"
    @iconv = Iconv.new "utf-8", @encoding||"ms-ansi"
  end
  while l = read_line_utf8 do
    record, field = [], ""
    until l =~ /^$/ do
      c = l.eat 1
      if    @sep.include? c   then
        record.push field
        field = ""
      elsif @quote.include? c then
        q = c
        while l.notempty? or (l = read_line_utf8) do
          c = l.eat 1
          if c == q then
            d = l.head 1
            if q == d then
              field << q
              l.eat 1
            else
              break
            end
          else
            field << c
          end
        end
      else
        field << c
      end
    end
    record.push field
    yield record
  end
end
open( filename, sep = nil, quote = nil) { |csv| ... } → obj click to toggle source

Open a file. The file may only be read as I refuse to write such weird formats.

Your guess is right what the sep and quote parameters mean.

# File lib/mscsv.rb, line 29
def open name, sep = nil, quote = nil
  File.open name do |f|
    i = new f, sep, quote
    yield i
  end
end

Private Instance Methods

read_line_utf8() click to toggle source
# File lib/mscsv.rb, line 265
def read_line_utf8
  l = @file.readline
  if @iconv then
    l = @iconv.iconv l
    l.gsub! /\xc2\xa0/, " "
  else
    l.force_encoding @encoding||Encoding::Windows_1252
    l.encode! Encoding::UTF_8
    l.gsub! "\u00a0", " "
  end
  l
rescue EOFError
end