Some Python Nooks and Crannies

I spent this weekend reading Learning Python (Second Edition for Python 2.3!) by Mark Lutz. Python is my favorite programming language, but my experience with it has been mostly anecdotal; I come up with my own solutions and functions and I Google whatever I do not know. I decided to spend a couple of days with this incredibly out-of-date book to formalize my knowledge of the base Python language. It was fairly easy reading because I already had experience with about 80% of the constructs discussed. But it was fun to learn some things that I have not used, and some things that I did not even know existed. I want to share some of these gems here. Pardon me if all of this stuff is obvious to you .

Populating a String with a Dictionary

>> data = {}
>> data['first_name'] = “Ryan”
>> data['age'] = 21 #some programming humor
>> print “Hello, my name is %(first_name)s and I am %(age)d years old.” % data
Hello, my name is Ryan and I am 21 years old.
Notice that we can also put a function after the last % sign above, as long as the function returns a dictionary.
The s and the d after the dictionary [...]

Mining Tuition Data for US Colleges and Universities, and a Tangent

I wrote this script for the UCLA Statistical Consulting Center. I don’t know all of the specifics, but one of our faculty members has this idea that we can help our paper, The Daily Bruin, with their graphics or something to that effect. I don’t quite understand because our paper has never really been big on graphics for data, but apparently some undergraduates are going to work on this.

Anyway, we need datasets that are of interest to UCLA students so that our undergraduates can create cool graphics that will stun the readers. Some of the data we were considering:

parking data for one week; gate entries, to correlate with some other variable (weather was mentioned. ugh)
Registrar study list/class schedule information for every student (anonymized of course) from Fall 2008. $50 for programmer time. (I could have done it quickly, for free! …if I worked in their office and it was legal, I mean.)
9/11 pager intercepts.
tuition data for US colleges and universities over ten years.

The tuition data was presented in a bunch of tables presented on several pages. Unfortunately, the type of school is not reported. Due to this limitation, I had to execute separate queries to access each year of data, [...]

Advanced Graphics in R

Each quarter the UCLA Statistical Consulting Center hosts minicourses twice per week in R and LaTeX. Tonight was my turn to present.

I presented Advanced Graphics in R. This was the same presentation I gave at the LA R Users’ Group in August will a fellow consultant. She and I had trouble coming together to make one presentation, so we shared our outlines, and we deemed her outline was deemed “Intermediate Graphics in R” with some ggplot, and mine was deemed “Advanced.” It seems to work.

My slides are here, and the handout version is here. The corresponding code is here.

Topics include:

Customizing graphics with par parameters
Using attributes of graphic objects
Basic graphics devices
Math typesetting for R graphics.
an example of a movie (here, but there is some funkiness with it)

Many think that “advanced” graphics would be lattice or ggplot. We chose to address those packages in their own minicourses.

My advisor gave me some good advice on writing R code that fits well in Beamer slides and lstlisting:

use local variables and introduce them.
don’t use function names as variable names (I violated this one here).

What to Expect?

In 2007, I was introduced to Twitter via the written qualifying exam towards my Ph.D.. At first, I did not know what to do with it. After a good year or so (maybe even sooner) passed, I began to follow some very interesting people that share the same interests as me. It has transformed my academic experience. It is great to run across tweets promoting conferences and newly released papers in my field. One of my favorite parts about Twitter, aside from interacting with tweeps, is the ability for me to quickly post a status update on what I am doing and I can even refer to it later. I consider it a platform for collaboration because I see what others are doing via tweets as well as linked blogs, whether it is a Twitter user, or some offline user. I quickly realized that 140 characters were not enough to solidify my thoughts and participate in the community. Thus, I decided to start this blog so I can share cool things I have found in my research/work with others anywhere on the web and communicate in more than 140 characters.

Here are some things that I am very interested in and [...]

Welcome!

Welcome to my new blog, Byte Mining! Data is all around us, all the time. It flows in from places you would least expect it, and more times that not, it remains in its original form untouched by human and machine. When data simply flows in and out of our lives, we miss out on the story that it tells us, and the clues that it provides to help solve our mysteries.

We humans are becoming more and more astute to the data that we exude and how we release it. There are two side effects to this – concerns for privacy is one. The second is that people are aware that their data is out there for consumption. In other words, people are no longer astonished when they realize how much of their data is readily available either publicly or to particular entities. This is a weight off of my back because I no longer get weird looks when I mention some tidbit of information I learned about someone to them on Facebook or MySpace. No, I am not a stalker, I was just more appreciative, or aware, of data ubiquity much quicker than most people my age, and was [...]