Mining Tuition Data for US Colleges and Universities, and a Tangent

I wrote this script for the UCLA Statistical Consulting Center. I don’t know all of the specifics, but one of our faculty members has this idea that we can help our paper, The Daily Bruin, with their graphics or something to that effect. I don’t quite understand because our paper has never really been big on graphics for data, but apparently some undergraduates are going to work on this.

Anyway, we need datasets that are of interest to UCLA students so that our undergraduates can create cool graphics that will stun the readers. Some of the data we were considering:

  • parking data for one week; gate entries, to correlate with some other variable (weather was mentioned. ugh)
  • Registrar study list/class schedule information for every student (anonymized of course) from Fall 2008. $50 for programmer time. (I could have done it quickly, for free! …if I worked in their office and it was legal, I mean.)
  • 9/11 pager intercepts.
  • tuition data for US colleges and universities over ten years.

The tuition data was presented in a bunch of tables presented on several pages. Unfortunately, the type of school is not reported. Due to this limitation, I had to execute separate queries to access each year of data, and each type of school. is the result of my labor. It is always a lot of fun, and it is an awesome feeling to be able to extract bulky data! This was also one of my first experiences with pylint. As much as I love Python, it is easy to write ugly code. pylint checks the style of code for violations of Python style such as tabs vs. spaces, spaces between binary operators, function naming conventions, line length and commenting conventions. It also checks for most (if not all) syntax errors, and some logic errors.

I provide this code for educational purposes only. Some may be tempted to ask for the dataset, but for me to grant the request would be in violation of the copyright.

Although extracting the data was the fun part, I feel it would be “sudden” for me to end the post here. So, I should take a quick look at it to show how important mining messy data is. There has recently been an uproar in California regarding increasing fees at the University of California and the California State University systems. The University of California system is the “research” University system in California, whereas California State University does not emphasize research as much (sorry, that’s the best way I can explain it) and does not offer a PhD degree. UCs are generally harder to be admitted to. Some of the better CSUs (such as Cal Poly etc.) can be more highly regarded than some of the lower tier UCs however.

Anyway, I wanted to take a look at how fees at UCLA, the UC system and the California education system have changed over time. I also want to dispel some myths that students have propagated on campus about the current state of fees in the UC system. Note that next year these fees are expected to increase by 30%…

Myth #1: UC and CSU are in this together, and equally.

One would hope that the burden of higher in-state fees would be shared equally between UC and CSU. The figures below indicates that this is sadly not the case. Since 2002, the difference between UCs and CSUs began growing. This may suggest that the services offered only at UC have grown more expensive over time (more research centers, more specialized staff?). Starting in 2002 and more so in 2006, UC fees began to skyrocket compared to those of CSU. There seems to be some non-statistical evidence that the State of California has disproportionately raised fees for UC students mainly due to difference in philosophy and demographics at both University systems. MYTH.


Myth #2: “We went from being one of the cheapest public school systems to one of the most expensive!”

While we are hurting here in California, the only reliable way to compare how our fees fare over time is to compare them to the national average over time. One argument is that UC fees are now some of the most expensive in the nation. My guess is that in-state fees are approximately normally distributed, but I chose to use the median to point out how important it is to understand what the median is! The plot below compares UC in-state fees to other public 4-year schools across the country. Our fees have been above the national median since data re[orting began in 1999, but since 2003 or so, our fees have grown from being about $2000 above the median to about $3000 above the median. This means that our fees are in the top 50% of school systems, but it does not display how high into the top 50% the UC in-state fees fall. While the median is easy to interpret, top 50% and bottom 50% are not black and white. Instead, let’s look at the percentile rank of UC’s fees over time which is displayed in the second plot below. This gives us a way to quantitatively compare UC fees to national median fees without a “shock” factor.


From 1999-00 school year to 2009-10 school year, UC fees have consistently been above the national median for 4-year public schools. It appears that UC fees increased from the 70th percentile to about 85th percentile in the past 10 years. I do not have data from before 1999, so this rumor may be true, but based on this data, it is false. First part of the myth: Not enough data to conclude, second part: TRUE.

Myth 3: “UC(LA) fees have gotten so ridiculous, it is practically becoming a private school!” or “I might as well just go to a private school!”

Um, think again. This one should be simple to dispel. Let’s take the private school across town that Bruins love to hate: USC. We see that USC’s in-state fees have risen linearly since 1999 and the gap between UCLA is growing, not shrinking! Even when compared to the national median for private 4-year not for profit colleges, UCLA is not anywhere close. It may be true that UC is more expensive than some cheaper private schools though. So put away the USC gear and the notion that you won’t pay much more at USC than you would at UCLA. MYTH.



So, yes, the California education system is a mess, but it is not the apocalypse that many students at UC schools have made it out to be…yet. California is in a deeper mess than the nation as a whole (relatively speaking, of course) so it is to be expected that our fees increase higher than other public school systems especially due to the cost of living in the state. Based on this data, I predict that even with an improving national economy, the national median in-state fees will catch up closer to UC’s fees.

Of course, all of my analysis considers data before the 30% increase that takes effect in Fall 2010. So stay tuned.

2 comments to Mining Tuition Data for US Colleges and Universities, and a Tangent

  • Boris

    Dear Ryan
    Can you at least suggest how I can get the tuition dataset?

    warm wishes

    • Hi Boris,

      I was nervous to post the data in a raw form since I do not own it. If you have Python installed, you can run the script and scrape the data from their website. If I find some time this week, I will see if I can find the raw data on my hard disk.


Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>