Canadian Counties

Man – I had a hell of a time finding data for one of my projects “Let’s add Canada” request. Our whole system is setup on State, County, City. Canada doesn’t really seem to have a solid concept of “County” in all Provinces. Some Provinces seem to have districts, some have districts and counties, some have neither. It’s pretty easy to find their federal voting districts but I couldn’t find a mapping of those districts to cities to save my life. Finally Kat found http://listingsca.com/ which I used for the data.

I was almost lazy and used .Net for the scraping but decided to do it in Ruby.

Here’s the Ruby code (this is just a command line Ruby file):

#scraper for http://listingsca.com/ - get's canadian province, county, city data

require 'net/http'
  require 'uri'
 
# the number is the number of community files the province has - you could probably be all elegant
# and do this with regx - but, I'm lazy
@provinces = 
{ 'Alberta' => 3,
   'British-Columbia' => ['BC',3],
  'Manitoba' => ['MB',1],
  'New-Brunswick' => ['NB',1] ,
  'Newfoundland' => ['NF',1],
  'North-West-Territories' => ['NT',1],
  'Nova-Scotia' => ['NS',3],
  'Nunavut' => ['NU',1],
  'Ontario' => ['ON',5],
  'Prince-Edward-Island' => ['PE',1],
  'Quebec' => ['QC',4],
  'Saskatchewan' => ['SK',1],
  'Yukon' => ['YN',1]
}

  def get_province_files()
	Net::HTTP.start('listingsca.com') do |http|
		@provinces.keys.each do |province|
			File.open(province + ".txt","w") do |file|
				1.upto(@provinces[province][1]) do |i|
					page = i.to_s
					page = "" unless i > 1
					puts province + " " + page
					response = http.get('/' + province + '/communities' + page + '.asp')
					file.puts response.body
				end
			end
		end
	end
  end
  
class MultiRegexp < Regexp
    def matches(str)
	str.scan(self) do
	  yield Regexp.last_match
	end
    end
end  

#get's all the data
def generate_csv_all()
	File.open("canada-csv-all.txt","w") do |file|
		@provinces.keys.each do |province|
			f = IO.read(province + ".txt")
			re = MultiRegexp.new(']+)>(.*?)', true)
			re.matches(f) { |i|
			    capture = i.captures[0].split(" ")[0]
			    if !capture.include?("communities") && !capture.include?("regions") && !capture.include?("district")
				file.puts @provinces[province][0].to_s + "," + province.gsub("-"," ") + "," + capture.gsub("-"," ").split("/").join(",")
			    end
			}
		end
	end
 end
  
 #ignores everything but the last "category" - some provinces seem to have district, region, city, some have region, city, some have just city
 def generate_csv()
	File.open("canada-csv.txt","w") do |file|
		@provinces.keys.each do |province|
			f = IO.read(province + ".txt")
			re = MultiRegexp.new(']+)>(.*?)', true)
			re.matches(f) { |i|
			    capture = i.captures[0].split(" ")[0]
			    arr = capture.gsub("-"," ").split("/")
			    if arr.size == 1
				county = "Whole Province"
			    else
				county = arr[arr.size-2]
			    end
			    if !capture.include?("communities") && !capture.include?("regions") && !capture.include?("district")
				file.puts @provinces[province][0].to_s + "," + province.gsub("-"," ") + "," + county + "," + arr.last
			    end
			    #capture = i.captures[0]
			    #start,stop = i.offset(0)
			    #puts "\"#{capture}\" starts at #{start}, ends at #{stop}"
			}
		end
	end
  end
  
  #comment out as needed
  get_province_files
  generate_csv_all
  generate_csv

And here’s a zip file with the output and ruby code for this – hopefully this saves some poor bastard some time :)

Yah – yah – I know – this could be done more effeciently – but, I just did what was fastest for me.

And for you Rails guys… here’s a migration to load it into a table:

class Canada < ActiveRecord::Migration
  def self.up

    sql = "drop table canada"
    begin
      execute(sql)
    rescue
    end
    
    create_table :canada do |t|
      t.column :code, :string, :limit => 2
      t.column :province, :string
      t.column :county, :string
      t.column :city, :string
    end

    sql = "delete from canada"
    execute(sql)
    CSV.open("#{RAILS_ROOT}/lib/dbfiles/canada.csv","r")   do |row|
      sql = "insert into canada (code, province, county, city) " +
        "values " +
        "('" + row[0] + "'," + 
        "'" + row[1] + "'," + 
        "'" + row[2] + "'," +
        "'" + row[3] + "')"
      execute(sql)
    end
  end

  def self.down
  end
end

Updated 7/31/2006

I think you could probably get better data on this from MLS.ca – at least data that real-estate people would better understand. Haven’t gotten around to scraping that yet though.