Wednesday, June 25, 2008

It Should Be Obvious

It should be obvious, but in case anyone is out there and can't figure it out, Toxic Rochester 2008 ended back in February. I simply didn't have time to keep it going. There was some good feedback, but not enough to indicate to me people had an interest in what I was trying to do this year.

Oh, and then there is the data. There is a lot of really interesting data sets available about Rochester and the surrounding area. And most of this is public, funded by tax dollars. But ask for it, and it's like pulling teeth-- if they even respond at all.

Toxic Rochester may return later this year with a new idea I've been kicking around. Basically, the idea is that I'm going to collect "ambient information" about Rochester. For example, I might sit in a place with public WiFi for a day and collect whatever information leaks out with tools like NetStumbler and Wireshark. Or, I might listen to the police scanner for a day and plot on a map all addresses heard or other aspects of what is transmitted.

The point? As usual, I'm not sure. But the underlying goal of Toxic Rochester has always been to come up with different perspectives on our city.

See you then.

Sunday, January 20, 2008

The Bridges of Monroe County

Infrastructure isn't sexy. People tend to take for granted the ability to get from point A to B and don't think about everything that makes that possible. Well, at least until things go horribly wrong, like with the collapse of the I-35W Mississippi River bridge. Something dramatic and tragic happens, and suddenly, the public starts to care about infrastructure. Of course, as soon as the news shifts to the antics of some Hollywood celebritard, people lose interest. But at least they cared for a little while.

The New York State Department of Transportation (NYSDOT) inspects the condition of New York's bridges. And now, thanks in part to the public's concern after the collapse of the I-35W bridge, it's available online.

I took the report on Monroe County and transformed it into a spreadsheet. From there, I did some sorting and grouping and graphing and came up with the following:
  • There are 610 bridges being tracked, with 607 having a rating (3 bridges are new and apparently haven't been rated yet). The rating is a weighted average of up to 47 measures which results in a scale that goes from 1 to 7. A rating of 5 or greater represents a bridge in good condition. A score below 5 means a bridge is "deficient" and needs some maintenance or rehabilitation. NYSDOT is careful to point out that this doesn't mean the bridge is unsafe.

    Count of bridges rated 5 or above: 429 (71%)
    Count of bridges rated below 5: 178 (29%)
  • Bridges can further be classified as "structurally deficient" (SD) or "functionally obsolete" (FO). Basically, a "structurally deficient" bridge is one that suffers from deterioration, damage, inadequate load capacity, or that results in traffic delays from repeated flooding. And again, NYSDOT carefully reminds you that a bridge classed as SD doesn't mean it is unsafe or likely to collapse. Bridges classed as "functionally obsolete" refers to a bridge not being able to meet current standards for the volume of traffic it carries.

    "SD" bridges:   65 (11%)
    "FO" bridges: 206 (34%)
    Other bridges: 337 (55%)
    According to data on the NYSDOT site giving percentage of each class of bridge across the whole state, Monroe County is doing a bit worse in terms of bridges that are "functionally obsolete."
  • I created a scatter graph that relates bridge rating to the year of the bridge. An attempt to fit the data to linear, power, and exponential lines resulted in linear winning, but with a R2 value of 0.3898, it's not a very good fit. That in itself suggests there isn't a strong relationship between how old a bridge is and its rating. Newer bridges rate higher, but as you look at older bridges, the ratings start to get fairly distributed.
  • The second graph looks at the number of "structurally deficient" and "functionally obsolete" bridges for each year, sums them, and takes the top 20. The big winner here is 1963. Part of that is because more bridges were built in that year than any other, but look at the number of "functionally obsolete" bridges relative to the other years listed. If you go back to the data, you see that the majority of these bridges were built in Gates. What happened? Did someone simply not predict the future well? Or did someone cheap-out and build only what they needed at the time?
  • I then decided to find the top five areas with the best average bridge ratings and the worst.


    Best Average Rating Worst Average Rating
    Area Rating Area Rating
    ---------------------------- -----------------------------
    Hilton (Village) 5.98 Clarkson (Town) 5.08
    Churchville (Village) 5.82 Wheatland (Town) 5.06
    Honeoye Falls (Village) 5.82 Fairport (Village) 4.94
    Sweden (Town) 5.82 Brockport (Village) 4.55
    Penfield (Town) 5.77 East Rochester (Village) 3.93
  • So where are these bridges, anyway?

    Area            Bridge Count
    ----------------------------
    Rochester (City) 110
    Greece (Town) 73
    Brighton (Town) 52
    Gates (Town) 49
    Henrietta (Town) 32
    Chili (Town) 30
    Webster (Town) 25
    Perinton (Town) 23
    Hamlin (Town) 23
    Irondequoit (Town) 21
    Parma (Town) 20
    Pittsford (Town) 17
    Penfield (Town) 16
    Rush (Town) 15
    Wheatland (Town) 14
    Mendon (Town) 14
    Ogden (Town) 13
    Clarkson (Town) 13
    Riga (Town) 13
    Fairport (Village) 8
    Honeoye Falls (Village) 4
    Hilton (Village) 4
    Spencerport (Village) 4
    Pittsford (Village) 3
    Brockport (Village) 3
    Webster (Village) 3
    Sweden (Town) 3
    Scottsville (Village) 2
    Churchville (Village) 2
    East Rochester (Village) 1
I was thinking about attempting to plot these bridges on a map. The only problem is that the bridge data gives the location of bridges as the intersection of "feature carried" and "feature crossed." NYSDOT presumably has latitude and longitude information about each bridge, as well as lots of other delicious data. But these days, asking for that kind of data is likely to put me in a database of potential terrorists.

Sunday, January 13, 2008

School District Comparison, 2005

You get what you pay for? Maybe not. I was curious how much each of the area school districts spend per pupil and how that money relates to the average performance of the pupils in each district. It's a complex question, but let's try to take a gross stab at it and see what happens.

I start with some raw statistics. Googling around, I came across the Greater Rochester Association of REALTORS® web site which had statistics from 2005 for all the school districts within 25 miles of Rochester. The statistics are unhelpfully in the form of a PDF file, but with some massaging of the data, I created an equivalent spreadsheet.

Four metrics related to performance are given in the spreadsheet (these are percentages of pupils passing testing thresholds and the percentages of graduates who went on to college). I needed a single number that combined these together, and for lack of a better method, did a simple average. The result is the columns under "Aggregate Performance."

(Note: East Irondequoit and Fairport are excluded because they have missing or incorrect data. This is noted in the original PDF. Since I couldn't come up with a fair method to incorporate these two school district's data, I excluded them.)

Some numbers to consider:
  • The difference between the school district that spent the least per pupil (Gananda) and the one that spent the most (Wheatland-Chili) is $4,352.
  • It's questionable what the aggregate performance means, but comparing the worst school district (Rochester City at 51%) and the best (Pittsford at 90%) shows a range of 39%.
  • Testing scores place the Rochester City school district at the bottom of the bunch, but surprisingly, the percentage of pupils going on to college is actually close to average.
  • A high pupil-to-teacher ratio doesn't seem to have much correlation to performance.
I next created an X/Y graph with X being the aggregate performance and Y being the dollars spent per pupil. The X and Y axises are centered on the median values. I then labeled most of the points on the graph.

Immediately, we see some interesting facts:
  • Pittsford and Brighton are the top two school districts, and they spend a lot on each pupil. But they don't spend the most.
  • The Rochester City school district spends a bit more than Pittsford or Brighton, but has the worst performance of all the school districts by a large margin.
  • Gananda spends the least, but does well better than most.
  • Wheatland-Chili and Rush-Henrietta spend the most, but both perform a bit below average.
So what does all this mean? The amount of money spent on students doesn't necessarily have much to do with performance. And if you start plotting performance against the other statistics in the spreadsheet (pupil/teacher ratio, the "wealth ratio"), you find very little correlation.

What do these numbers suggest to you, if anything?

Friday, January 4, 2008

New Businesses in Rochester

The Democrat and Chronicle maintains a database of new businesses (or more precisely, new DBA's). At the time I sampled it, the database tracked DBA's between 4/11/2007 and 12/12/2007, or eight months and 2576 DBAs. Their data comes from the Monroe County Clerk's office; and from some of the errors I detected, I'm guessing someone manually types in this information.

I was curious where these DBAs were registered. When plotted, would they cluster in areas, or would they be fairly distributed over the city? So I massaged the data, and brought it into MapPoint to mark on a map where they are located. The result isn't unexpected-- the center of the city has clusters of new businesses, and as you move outward, other area cities and major roads spawn new clusters.

Some of this clustering is probably just a matter of population. So, I've made a second map that divides the count of new businesses by the population, for each reported Zip Code. For those that prefer a numerical breakdown, you can play with the data yourself.

Wednesday, January 2, 2008

Bear With Me For a Moment...

From xkcd:

Tuesday, January 1, 2008

Billable Hours You Can Dance To

Toxic Rochester isn't just interested in public datasets. Private ones that describe aspects of people's lives in Rochester are also fair game. So as an example, here's a dataset of the billable hours I worked in 2007. The company I work for bills clients by the hour, which we record using an application that tracks time in a database. So it was a simple matter of extracting and graphing the data.

I thought about other ways I could render this data, and came up with sound. For each of the 52 weeks of 2007, I took the hours worked and and treated it as a MIDI note value. Each week's data is an now an eighth-note in 4/4 time at 125 BPM. Here's what it sounds like:

Just the notes played with a cheesy xylophone sound.

Ridiculous techno/hip-hop/dance/whatever version.

Monday, December 31, 2007

Crazy People By Zip Code

Where I used to live, there was a woman next door who was mentally unstable. She liked to set things on fire, throw furniture out the window, and once ran around naked in front of her townhouse until cops showed up, tightly wrapped her in a blanket, and carted her away.

Not all people who go to the hospital for psychiatric treatment are as crazy as my former neighbor. There are a wide range of disorders that are broadly classed as psychiatric, so it is unfair to lump all these cases in as being "crazy." To do so shows a lack of sensitivity and only contributes to the stigmatization of people with mental disorders.

So, how many crazy people live near you?

Over on infoshare.org, they have a lot of fun datasets available for New York. One categorizes hospital patients by type and helpfully, by Zip Code. So I selected just the Monroe county Zip Codes and brought the data into MapPoint to visualize where these people were. But that seemed unfair, since some Zip Codes are more population-dense than others. So I divided the number of crazies by the population of that Zip Code and colored them by 6-quantiles. Here's the raw data for that if you want to see where your Zip Code ranks.

The big winner is 14604 (that's inside the Inner Loop) with a whopping 5.1% of the population receiving psychiatric care. That's 2.8 times more than the next-most crazy Zip Code.