March | 2011 | Code and Culture

Archive for March, 2011

Regressive Aspects of the ASA Leadership’s T&T Policy

| Gabriel |

There are two big ironies about the ASA leadership’s preferred framing of the monotonic dues hike as being about a progressive fee structure rather than an increase in aggregate revenues. The obvious irony is that nobody has challenged the basic idea of a progressive fee structure or even a revenue-neutral bracket adjustment. What we have challenged is the aggregate increase in revenues and the lack of transparency in explaining why the ASA needs more money despite already being much more expensive than AEA.

The less obvious irony is that us aggrieved blogger types have proposed other reforms to make ASA more progressive and didn’t hear anything back from the leadership. The last time us bloggers got angry at ASA it was about treating the job bank as a profit center. ASA charges departments $200/month to be listed in the job bank, even if it’s a cross-disciplinary search. Not only that, but it discouraged the use of section listservs to circulate job announcements. That is, the ASA seems to view the job bank less as a service to the membership than as a fief bringing in rents and in this understanding alternative flows of job market information constitute something like tax evasion.

One obvious consequence of this set of policies is that while soc departments will just suck it up and pay hundreds of dollars for the ASA’s version of Craigslist, cross-disciplinary searches (area studies, various ethnic studies programs, comm studies, b-schools, etc.) that might be open to hiring sociologists can hardly be expected to pony up $200-$800 over the course of a search to attract PhDs from just one of the several disciplines they are interested in. This means sociologists don’t get these jobs. We can be even more specific and say that it is the younger and/or poorer sociologists who would be most interested in these openings and are most hurt by ASA’s high job bank fees. That is, current ASA policy of nickel-and-diming departments has a regressive incidence on the membership. The bloggers’ interest in revoking or relaxing this policy would mean relatively greater reliance on dues (which have always been progressive) and relatively less reliance on job bank fees that have an indirect regressive incidence on our weakest members. Net result, the total “tax and transfer” system of ASA is less progressive than it appears and that we proposed to make it through reforming the job bank.

When ASA eliminates its regressive job bank policy, then I’ll take the leadership seriously about how it would be progressive to adjust the dues structure upwards for the top-earning members but downwards for nobody. Until then I’ll assume that it’s just an organization that doesn’t know that there’s no shame in being a humbly efficient membership service organization and so it seeks all possible sources of revenue — whether they be progressive dues or (indirectly) regressive job bank fees — to finance its K-Street fantasies.

March 30, 2011 at 11:08 pm GR 12 comments

asa dues petition: call for participation

| Gabriel |

Reproduced from Scatterplot:

Anyone who is interested in crafting the wording of the petition or making decisions about whether anonymous signatures should be allowed, etc., should email Ezra at ewzucker [a t] mit [d o t] edu by 8am Thursday. When all decisions are made, we will circulate a petition and everyone can decide whether they would like to sign it.

Here is the current draft of the petition as of this writing. All ASA members, regardless of subfield, institution, or rank are welcome to participate in drafting the petition. We are hoping to have a final wording of the petition to circulate soon.

March 29, 2011 at 4:47 pm GR

ASA’s K-Street Condo Delenda Est

| Gabriel |

I just renewed my ASA membership, mostly so I could vote no on the ~~progressive fee structure~~ essentially monotonic fee increase above and beyond recent COLA increases. This is a matter of principle to me — I’m not presenting at ASA this year so (even over the long-run) it would have been cheaper for me to just let my membership lapse this year. In the short-run we need to resist this fee increase as well as counter-productive nickel-and-diming like charging departments $200 a month to place job listings (including multi-disciplinary positions) in the job bank.

Obviously though these revenues are being spent on something so in the long-run we need to pare back the ASA to a less ~~Quixotic~~ ambitious organization. Economists seem to be pretty happy to pay less than us and get more journals in the bargain. The most important step is to get the ASA out of downtown DC, home to some of the country’s most expensive commercial real estate and high cost-of-living for skilled labor. I hear there’s a vacancy in the Nashville office building where AEA is based.

See also Kieran, Jenn, Jeremy, and most of all The Disgruntled Sociologist.

On a tangent, every time I renew my membership it agitates me that I have to check-off my ascent to the ASA Code of Ethics. It strikes me as a violation of academic freedom and intellectual honesty that my association considers it unethical to have unpleasant things to say about ascriptive groups. (Or are we allowed to say unpleasant things so long as we don’t, as is otherwise encouraged, draw policy conclusions from them?) As someone who mostly studies organizations rather than individuals this doesn’t directly affect me and I certainly hope that social reality conforms to the high egalitarian standards of Part D of the Code of Ethics, but I think we should allow for the theoretical possibility that a researcher acting in good faith could have research findings that paint some ascriptive group or other in a negative light.

March 24, 2011 at 3:56 pm GR 5 comments

Misc Links

| Gabriel |

Anything But Justin Bieber: Symbolic Exclusion and NPR Story Selection Dislikes
Glee breaks Elvis Presley’s record. Note that this is more than a little misleading since Glee singles have brief but intense bursts of popularity whereas Elvis (or the Beatles for that matter) had much more sustained popularity.
WalMart is pushing CSR on its supply chain. WalMart says that their customers are demanding these practices but I call bullshit on that as CSR is a superior good and WalMart’s niche is down market (though it’s possible that they think CSR is a way to draw in wealthier customers). Rather I think it likely that WalMart knows it has saturated the exurbs and small cities and can only expand by going into big cities, but this ambition is thwarted by a left-wing political coalition. WalMart is never going to get the unions to drop their opposition (here in LA the grocers’ unions have led the fight against WalMart) but several of their recent actions suggest a strategy of peeling off enough of the “food desert” and “sustainability” greens to squeeze through the zoning process. In other words, it’s coercive isomorphism all the way down.

March 7, 2011 at 1:23 pm GR

OS X memory clean up w/o reboot

| Gabriel |

Windows XP had a notorious bug (which I think has been fixed in Vista/7) that it didn’t allocate memory very well and you had to restart every once in awhile. Turns out OS X has a comparable problem. Via the Atlantic I see a trick for releasing the memory.

Basically, you do a “du” query and this tricks the computer into tidying things up. This takes about fifteen minutes and implies a noticeable performance hit but after that it works much better. Ideally, you run it as root and the Atlantic suggests a one-liner that uses sudo, but keeping your root password in plain text strikes me as a really bad idea. Rather I see three ways to do it.

Run it as a regular user and accept that it won’t work as well. The command is just
```
du -sx /
```
and you can enter it from the Terminal, cron it, or save it as an Automator service.
Do it interactively as root, which requires this code (enter your password when prompted).
```
sudo du -sx /
```
Cron it as root. Here’s how to get to root’s cron table
```
sudo -i
crontab -e
```
Once in the cron table just add the same
```
du -sx /
```
command as before, preferably scheduled for a time when your computer is likely to be turned on but not doing anything intensive (maybe lunch time).

March 3, 2011 at 2:49 pm GR

Fashion is danger

| Gabriel |

Dear Senator Feinstein,

I am writing to you both as a constituent and as an expert on creative industries. I want to thank you for your opposition to the IDPPPA (S. 3728) bill that would extend formal (and actionable) intellectual property rights to the fashion industry.

As you know, the Constitution authorizes Congress “to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.” That is to say, intellectual property rights are meant to correct a market failure of insufficient creativity. The idea that an industry that releases hundreds of creative designs every spring and fall has a shortage of creativity is frankly absurd.

Not only is it implausible to imagine that we would see more fashion in a world with the IDPPPA but to the contrary we should see less. Fashion is currently the only creative realm where one can create freely without worrying about concepts like infringement, clearance, or licensing. IDPPPA would end this and create a world where (much like biochemists or musicians) fashion designers would spend less of their time at the drafting table and more of their time with lawyers. This would both imply a deadweight loss and increase barriers to entry. One need only look at the decline in creativity in hip hop music following Grand Upright and Bridgeport to see how the extension of IP rights to fashion would create gridlock effects that would completely swamp whatever marginal incentive effects IP would provide.

Please have your staff contact me if I can be of any assistance on this issue.

Gabriel Rossman
Assistant Professor
Sociology, UCLA

March 2, 2011 at 3:06 pm GR 2 comments

Scraping Twitter

| Gabriel |

I recently got interested in a set of communities that have a big Twitter presence and so I wrote some code to collect it. I started out creating OPML bookmark files which I can give to RSSOwl,* but ultimately decided to do everything in wget so I can cron it on my server. Nonetheless, I’m including the code for creating OPML in case people are interested.

Anyway, here’s the directory structure for the project:

project/
  a/
  b/
  c/
  lists/

I have a list of Twitter web URLs for each community that I store in “projects/lists” and call “twitterlist_a.txt”, “twitterlist_b.txt”, etc. I collected these lists by hand but if there’s a directory listing on the web you could parse it to get the URLs. Each of these files is just a list and looks like this:

http://twitter.com/adage
Tweets by NotGaryBusey
http://twitter.com/science

I then run these lists through a Bash script, called “twitterscrape.sh” which is run from “project/” and takes the name of the community as an argument. It collects the Twitter page from each URL and extracts the RSS feed and “Web” link for each. It gives the RSS feeds to an OPML file, which is an XML file that RSS readers treat as a bookmark list and to a plain text file. The script also finds the “Web” link in the Twitter feeds and saves them as a file suitable for later use with wget.

#!/bin/bash
#twitterscrape.sh
#take list of twitter feeds
#extract rss feed links and convert to OPML (XML feed list) format
#extract weblinks 

#get current first page of twitter feeds
cd $1
wget -N --input-file=../lists/twitterlist_$1.txt
cd ..

#parse feeds for RSS, gen opml file (for use with RSS readers like RSSOwl)
echo -e "<?xml version\"1.0\" encoding=\"UTF-8\"?>\n<opml version=\"1.0\">\n\t<head>\n\t<body>" > lists/twitrss_$1.opml
grep -r 'xref rss favorites' ./$1 | perl -pe 's/.+\/(.+):     .+href="\/favorites\/(.+)" class.+\n/\t\t<outline text="$1" type="rss" xmlUrl="http:\/\/twitter.com\/statuses\/user_timeline\/$2"\/>\n/' >> lists/twitrss_$1.opml
echo -e "\t</body>\n</opml>\n" >> lists/twitrss_$1.opml

#make simple text list out of OPML (for use w wget)
grep 'http\:\/' lists/twitrss_$1.opml | perl -pe 's/\s+\<outline .+(http.+\.rss).+$/\1/' > lists/twitrss_$1.txt

#parse Twitter feeds for link to real websites (for use w wget)
grep -h -r '>Web</span>' ./$1 | perl -pe 's/.+href="(.+)" class.+\n/$1\n/' > lists/web_$1.txt

echo -e "\nIf using GUI RSS, please remember to import the OPML feed into RSSOwl or Thunderbird\nIf cronning, set up twitterscrape_daily.sh\n"

#have a nice day

This basically gives you a list of RSS feeds (in both OPML and TXT), but you still need to scrape them daily (or however often). If you’re using RSSOwl, “import” the OPML file. I started by doing this, but decided to cron it instead with two scripts.

The script twitterscrape_daily.sh collects the RSS files, calls the perl script to do some of the cleaning, combines the cleaned information into a cumulative file, and then deletes the temp files. Note that Twitter only lets you get 150 RSS feeds within a short amount of time — any more and it cuts you off. As such you’ll want to stagger the cron jobs. To see whether you’re running into trouble, the file project/masterlog.txt counts how many “400 Error” messages turn up per run. Usually these are Twitter turning you down because you’ve already collected a lot of data in a short amount of time. If you get this a lot, try splitting a large community in half and/or spacing out your crons a bit more and/or changing your IP address.

#!/bin/bash
#twitterscrape_daily.sh
#collect twitter feeds, reshape from individual rss/html files into single tab-delimited text file

DATESTAMP=`date '+%Y%m%d'`
PARENTPATH=~/project
TEMPDIR=$PARENTPATH/$1/$DATESTAMP

#get current first page of twitter feeds
mkdir $TEMPDIR
cd $TEMPDIR
wget -N --random-wait --output-file=log.txt --input-file=$PARENTPATH/lists/twitrss_$1.txt

#count "400" errors (ie, server refusals) in log.txt, report to master log file
echo "$1  $DATESTAMP" >> $PARENTPATH/masterlog.txt
grep 'ERROR 400\: Bad Request' log.txt | wc -l >> $PARENTPATH/masterlog.txt

#(re)create simple list of files
sed -e 's/http:\/\/twitter.com\/statuses\/user_timeline\///' $PARENTPATH/lists/twitrss_$1.txt > $PARENTPATH/lists/twitrssfilesonly_$1.txt

for i in $(cat $PARENTPATH/lists/twitrssfilesonly_$1.txt); do perl $PARENTPATH/twitterparse.pl $i; done
for i in $(cat $PARENTPATH/lists/twitrssfilesonly_$1.txt); do cat $TEMPDIR/$i.txt >> $PARENTPATH/$1/cumulativefile_$1.txt ; done

#delete the individual feeds (keep only "cumulativefile") to save disk space
#alternately, could save as tgz
rm -r $TEMPDIR

#delete duplicate lines
sort $PARENTPATH/$1/cumulativefile_$1.txt | uniq > $PARENTPATH/$1/tmp 
mv $PARENTPATH/$1/tmp $PARENTPATH/$1/cumulativefile_$1.txt

#have a nice day

Most of the cleaning is accomplished by twitterparse.pl. It’s unnecessary to cron this script as it’s called by twitterscrape_daily.sh, but it should be in the same directory.

#!/usr/bin/perl
#twitterparse.pl by ghr
#this script cleans RSS files scraped by WGET 
#usually run automatically by twitterscrape_daily.sh

use warnings; use strict;
die "usage: twitter_rss_parse.pl <foo.rss>\n" unless @ARGV==1;

my $rawdata = shift(@ARGV);

my $channelheader = 1 ; #flag for in the <channel> (as opposed to <item>)
my $feed = "" ;   #name of the twitter feed <channel><title>
my $title = "" ;  #item title/content <item><title> (or <item><description> for Twitter)
my $date = "" ;   #item date <item><pubDate>
my $source = "" ; #item source (aka, iphone, blackberry, web, etc) <item><twitter:source>

print "starting to read $rawdata\n";

open(IN, "<$rawdata") or die "error opening $rawdata for reading\n";
open(OUT, ">$rawdata.txt") or die "error creating $rawdata.txt\n";
while (<IN>) {
	#find if in <item> (ie, have left <channel>)
	if($_ =~ m/^\s+\<item\>/) {
		$channelheader = 0;
	}
		
	#find title of channel
	if($channelheader==1) {	
		if($_ =~ m/\<title\>/) {
			$feed = $_;
			$feed =~ s/\s+\<title\>(.+)\<\/title\>\n/$1/; #drop tags and EOL
			print "feed identifed as: $feed\n";
		}
	}

	#find all <item> info and write out at </item>
	if($channelheader==0) {	
		#note, cannot handle interal LF characters. 
		#doesn't crash but leaves in leading tag and 
		#only an issue for title/description
		#ignore for now
		if($_ =~ m/\<title\>/) {
			$title = $_;
			$title =~ s/\015?\012?//g; #manual chomp, global to allow internal \n
			$title =~ s/\s+\<title\>//; #drop leading tag
			$title =~ s/\<\/title\>//; #drop closing tag
		}
		if($_ =~ m/\<pubDate\>/) {
			$date = $_;
			$date =~ s/\s+\<pubDate\>(.+)\<\/pubDate\>\n/$1/; #drop tags and EOL
		}
		if($_ =~ m/\<twitter\:source\>/) {
			$source = $_;
			$source =~ s/\s+\<twitter\:source\>(.+)\<\/twitter\:source\>\n/$1/; #drop tags and CRLF
			$source =~ s/&lt;a href=&quot;http:\/\/twitter\.com\/&quot; rel=&quot;nofollow&quot;&gt;(.+)&lt;\/a&gt;/$1/; #cleanup long sources
		}
		#when item close tag is reached, write out then clear memory
		if($_ =~ m/\<\/item\>/) {
			print OUT "\"$feed\"\t\"$date\"\t\"$title\"\t\"$source\"\n";
			#clear memory (for <item> fields) 
			$title = "" ;
			$date = "" ;
			$source = "" ;
		}
	}
}
close IN;
close OUT;
print "done writing $rawdata.txt \n";

*In principle you could use Thunderbird instead of RSSOwl, but its RSS has some annoying bugs. If you do use Thunderbird, you’ll need to use this plug-in to export the mailboxes. Also note that by default, RSSOwl only keeps the 200 most recent posts. You want to disable this setting, either globally in the “preferences” or specifically in the “properties” of the particular feed.

March 1, 2011 at 4:51 am GR 4 comments

Code and Culture

Archive for March, 2011

Regressive Aspects of the ASA Leadership’s T&T Policy

asa dues petition: call for participation

ASA’s K-Street Condo Delenda Est

Misc Links

OS X memory clean up w/o reboot

Fashion is danger

Scraping Twitter

The Culture Geeks

Archives

Recent Comments

Blogroll

References/Resources