class MsCsv
When Microsoft's Excel exports a file as a CSV, there is hardly a
convention that will not be violated. Fields that contain newlines will be
exported in quotes and the newlines will not be escaped. They will be
written just as one "\x0a"
. Yet, the records
themselves still are separated by MS newlines, that is
"\x0d\x0a"
.
Constants
- QUOTE
- SEP
- VERSION
Attributes
Public Class Methods
The sep
and quote
parameters are
";"
and "\""
by default.
More than one character may be specified.
# File lib/mscsv.rb, line 50 def initialize file, sep = nil, quote = nil @file = file @sep, @quote = sep||SEP, quote||QUOTE end
Public Instance Methods
Return record with fields converted to several types. In case any of the
fields in a record may not be converted legally, the err
variable will contain the exception and the fields will be returned as
strings. At present, following types may be demanded:
s => string S => stripped string, nil if empty n => integer c => currency # class +Currency+ must respond to +parse+ (*) d => date # class +Date+ must respond to +strptime+ (*) t => time # class +Time+ must respond to +parse+ (*) b => boolean (*) choose the appropriate +require+ yourself.
Example:
MsCsv.open "somefile.csv" do |f| f.each_as "ndsscb" do |r,d| puts r.length.inspect + " " + r.inspect unless d end end
# File lib/mscsv.rb, line 240 def each_as recdef case recdef when String then recdef = recdef.scan /\S/ end d = recdef.map { |f| FORMATS[ f] } each_notempty do |r| begin i = 0 r = (d.zip r).map do |(fmt,fld)| i += 1 if fld then f = fmt.new fld f.val end end rescue r = r.map { |x| x.notempty? } err = "#{i}: #$!" end yield r, err end end
Same as each_record
except that records containing only
nil
fields will be skipped.
# File lib/mscsv.rb, line 105 def each_notempty each_record { |r| yield r unless r.compact.empty? } end
Iterate through the CSV file. The fields will be returned as strings or
nil
. A line consists of at least one field, so an empty line
will yield a one-element array containing nil
.
# File lib/mscsv.rb, line 62 def each_record unless defined? Encoding then require "iconv" @iconv = Iconv.new "utf-8", @encoding||"ms-ansi" end while l = read_line_utf8 do record, field = [], "" until l =~ /^$/ do c = l.eat 1 if @sep.include? c then record.push field field = "" elsif @quote.include? c then q = c while l.notempty? or (l = read_line_utf8) do c = l.eat 1 if c == q then d = l.head 1 if q == d then field << q l.eat 1 else break end else field << c end end else field << c end end record.push field yield record end end
Open a file. The file may only be read as I refuse to write such weird formats.
Your guess is right what the sep
and quote
parameters mean.
# File lib/mscsv.rb, line 29 def open name, sep = nil, quote = nil File.open name do |f| i = new f, sep, quote yield i end end
Private Instance Methods
# File lib/mscsv.rb, line 265 def read_line_utf8 l = @file.readline if @iconv then l = @iconv.iconv l l.gsub! /\xc2\xa0/, " " else l.force_encoding @encoding||Encoding::Windows_1252 l.encode! Encoding::UTF_8 l.gsub! "\u00a0", " " end l rescue EOFError end