Undergraduate Capstone Open Source Projects

Archive for the ‘ElmCity’ Category

elmcity wrapup

Posted by Diane Tam on 2009/12/16

First off, I just want to say it has been a pleasure working with everyone.  I certainly learned a lot this term with respect to the project as well as working in a team environment.  A few final blogs with respect to the generic plugin parser for the elmcity project are listed below and the latest versions of the code can be found in the code repository here:


I hope this is good start to recreating FuseCal!

The generic plugin can successfully parse the curator sites listed here: http://dianetam.com/dev/blog/2009/12/12/elmcity-design-wrapup-and-next-steps/ as well as a few of the sites from the previously parsed FuseCal pages listed here: http://blog.jonudell.net/2009/07/07/strategic-choices-for-calendar-publishers/

Generic Plugin API

The generic plugin implementation api can be found here:


Class File: src/plugins/patterns/genericplugin.py

Test File: tests/genericpluginTests.py

Generic Plugin Design and Patterns

The generic plugin design and patterns are detailed here:


Elmcity Design Wrapup and Next Steps

A general overview of the system and the next steps to be taken can be found here:


If there are any questions in regards to the project or generic plugin details, feel free to let me know.  Even though the course is over, I would still like to stay involved with the project and see where the project goes through friendfeed and even help take the project further if possible 🙂  I’m constantly looking for projects to expand my experience and interests further so if there are any opportunities for that I’d be more than willing to jump on board.

I can be contacted at diane.tam@gmail.com

Thanks everyone!

Posted in ElmCity | Tagged: | Leave a Comment »

what worked and what needs work

Posted by Diane Tam on 2009/12/03

First off, thank you to Greg and all the profs and TAs that made this project course possible.  It was definitely a unique and invaluable experience that I believe all students should consider in their undergraduate years.

What worked:

I’ve been working full-time as a programmer for 3 years now and I wish I had taken this course during my undergrad as this course is extremely applicable to real life work practices.  It teaches you about the importance of communication and coordination with teams across the world with respect to solving a problem.  This is a very common scenario in the workplace, whether it be communication, design,  integrating code, code reviews, etc…and this course helps set you in that mindset and find ways to work with it.  Overcoming obstacles like timezone differences can be difficult but not impossible.  With good planning and communication, teams can work very well together and produce results.  This course definitely puts you in that perspective and forces you to realize the importance of planning and communication.  Having this experience before setting out full-time as a programmer or any profession in fact gives you a great additional skill set!

What needs work:

Having said that, working ‘virtually’ with teammates really posed a challenge with respect to communication.  I know this is something many teams posted about but I’m going to emphasize on the planning aspect of communication.  A good plan could lead to a good design which in turn could lead to good direction.  The code sprint was a great chance for teams to spend the time planning and designing their systems but after that, it was difficult to do any.  As a result, no one actually ‘fully’ understood the entire system.  It’s difficult to design out the whole system at first and many requirements can change along the way.  I think implementing an iterative design and development process could have worked here.  Rather than looking at milestones, look at perhaps shorter two week sprints.  Dedicate a couple of days before each two week sprint to plan out what needs to be done in that period, address design issues for only those tasks and work on those specific items.  After a sprint, address outstanding issues and move on to the next set of tasks needed.  Taking on the system in appropriate chunks not only allows you to focus in on a specific area, but it allows you to understand how one part integrates with the next and realize what works and doesn’t work with the design early on.  It also gives you more direction because you are not overwhelmed with the big picture.  Communication with respect to these smaller iterative periods could be improved as well since everyone understands what is going on and what needs to be done in that two week cycle.  Perhaps this is something that could be considered within teams to help the struggle of meeting deadlines, communication and direction.

Thanks again for the great experience everyone!

Posted in ElmCity | Tagged: | Leave a Comment »

Minutes for the 2009-11-20 Elmcity Meeting

Posted by jorygraham on 2009/11/23

The minutes themselves are available here.

This week we tried to figure out what we’ll try to have done for our deadline next week. The list we obtained by the end of the meeting was as follows:

  • Hooking up the dev database.
  • Having all the external libraries situated under a single directory.
  • Supporting more TABULAR sites.
  • Supporting a few CALENDAR sites.
  • Hooking up pluginFinder to the generic plugins.
  • Supporting a query on a single url that tells you two things: if it’s parsable, and which plugin handles the parsing.

There’s lots to be done as the project wraps up, but we’re still optimistic about our outcome.

Posted in ElmCity | 1 Comment »

ElmCity Status Report for Nov 4th

Posted by Diane Tam on 2009/11/05


+documenting and testing remote.py, batch.py (this leaves us with no “big” modules untested, rotate.py and event.py can
be done quickly later on).
+ timezone had to be revisited and changed a bit (timezone_source, timezone_dest), and together with other little things
we officially announced the new features to the community

– I’ve been sick for the past couple of days, so things are moving a bit slower for me than usual.
It’s not a real roadblock, just letting everyone know why I was semi-silent since Friday.
– we need to adopt development / stable scheme on the web server, Jory what’s the status on that one?
– tests are not working for one of the plugins and for the integration testing, Meghan, Jack what’s the status
on those ones? It’s not a major roadblock per se (the problems are with the test code itself, not with the
project’s code), but it’d be nice if they worked.

Next Steps:
I want to concentrate on the patterns and generalized plugin from now on, spending much less time
on the service as a whole, after all we have a stable version in the repos right now and the next
big step is the generalized plugin.


– Refined a generic plugin that can parse some pages successfully.  Latest plugins have been delivered to google code repository.
– Continually working towards generalizing the scraper to scrape multiple sites.
– Documenting patterns through blog and tester.  Latest post in regards to plugin and patterns can be found here: http://dianetam.com/dev/blog/2009/11/05/patterns-and-thoughts-continued/

– Work, classes.

Next Steps:
– Continue to scrape and document patterns found from event sites.
– Establish a more complete analysis of the patterns found from the fuse-cal parsed pages as well as the sample pages curators have pointed to.
– Divide the list according to the patterns found and manage the lists (ie. through Delicious tags).
– Populate that dictionary for all our targets plus the ones the curators have added to our list.
– Build a tester that will accept the various patterns and extract appropriate information from them

Posted in ElmCity, Status | Tagged: | Leave a Comment »

Elmcity Meeting Minutes for Oct 30 2009

Posted by jorygraham on 2009/11/01

On Friday, the Elmcity team had its weekly meeting, the minutes of which can be found here.

A summary of the things we covered:

We went over and finalized our grading scheme, including exactly what functionality we hope to have achieved by the 29th of November.

In that vein, we’ve decided that our functionality will primarily be dictated by the ability to parse the pages brought up by our users here, and secondarily by this list. There’s a lot of overlap between the two, but our focus is on the sites the current curators are interested in.

We’ve all been looking over the patterns that should be recognized by our general parser, and will report on them this week.

Finally, we discussed which approach should be taken for parsing based on two libraries available for python: parsedatetime and dateutil. Dateutil seems to err on the side of caution, choking on any non-datetime text; whereas parsedatetime can handle a really wide range, but will also return false positives. We’ll likely have to take a middle ground approach to our parsing, since human generated text is so unpredictable.

Posted in ElmCity | Leave a Comment »

Marking Scheme for ElmCity project

Posted by Nikita Pchelin on 2009/10/30

Mark Breakdown

1. Individual (50%)

a) 40% – participation:

  • 20% code contributions
  • 10% discussions on friendfeed and google code wiki throughout the term
  • 10% weekly punchlines and how they were met (trackable through the “status” portion)

b) 10% – written component

2. Team (50%)

a) 30% Features and Functionality:

  • all the existing functionality up-to-date (plugins, events filtering, timezones management) still works as expected (test don’t fail), existing bookmarks on delicious produce valid iCal feeds
  • generalized parser that can at least recognize a minimum set of information for the event (date & time, title, and a link) on each of the pages listed here. If for some reason some of the page(s) are odd and cannot be parsed by the generalized plug-in, we must opt-out for completing a site-specific plug-in. If time permits, (outside 30%) we are targeting pages on here and too
  • product has to be ready to be shipped. That is, we have to have a document that describes all the third party libraries (and software) one may need to yank a fresh copy from google code and start working with the code. We can include general steps (i. e. have mysql, phpmyadmin installed and working), but whatever concerns the project itself must be specific (i. e. what file do I edit to change the location of calendar folder?)

b) 10% system has a clear documentation (comments inside the files, high-level description of the system) that could potentially allow other people to contribute to the project. Tests has been implemented (unit and integration) and run smoothly.

c) 10% written report + youtube presentation of the service, that describes what work has been done through out the term, demonstrates the abilities of the project, touches on the future of the project (i.e. if we were to continue, what are things that are missing and that we’d wanted to implement in the future)

Posted in ElmCity | 2 Comments »

ElmCity Status Report for Oct 28th 2009

Posted by Diane Tam on 2009/10/29


– AM/PM problem for librarything fixed, tests added
– timezone, we have a solution and it has been implemented and is out there for testing, calling remote like this:
generates a VTIMEZONE in the calendar file. We support standard tzid from Olson db, and we map all the
windows timezones that elmcity provides us with to the existing tzids.
– pluginFinder.py, icwWriter – written documentation, unit tests
– moved views.py code into a separate file, and made views.py call it from there (this is to enforce independence from django)
– extending web-interface – I have not done much, just changed how AJAX communicates with python (XML now), so that
if we eventually look at extending it, it will be easier, plus that killed the last hard-coded reference in javascript code (domain name)


Next Steps:
– I have not seen any work done on #2 as was agreed on Friday’s meeting, this is an important issue that is kinda holding
us back. If there will be no change by tomorrow’s afternoon I am reassigning this ticket to myself and most probably closing
it by tomorrow’s evening
– documenting and testing remote.py, batch.py (this leaves us with no “big” modules untested, rotate.py and event.py can
be done quickly later on).
– I’ve been thinking of coming up with an approach of looking at HTML pages and finding events(here:
http://nikitapchelin.wordpress.com/2009/10/25/patterns/)  My idea is to actually
try and look for dates first, and then look and compare where structure-wise those dates are located. From that information
it might be possible to figure out which structured blocks represent events. I actually started working on a little script
that strips useless things from HTML and tries to find events. I want some sort of discussion, mainly to see if this
direction is worth moving into. If indeed so, I would like to continue exploring this way throught the week and coming up
with a script that will attempt to analize HTML pages this way.


– Completed some site specific scrapers for fuse-cal parsed pages, specifically with <table> structured event sites.
– Continually working towards generalizing the scraper to scrape multiple sites.
– Documenting patterns through blog and tester.

-Work, classes.

Next Steps:
– Continue to scrape and document patterns found from event sites.
– Establish a more complete analysis of the patterns found from the fuse-cal parsed pages as well as the sample pages curators have pointed to.
– Divide the lists according to the patterns found and manage the lists through Delicious tags.
– Build a tester that will accept the various patterns and extract appropriate information from them.


– Assignments and midterms.

Next Steps:
– Go through all the communication channels from last week, and to try to summarize the state of the project in a blog post. I will combine that with the feature requests put forth by the users on FriendFeed in an attempt to figure out both where we are at, and what our endgame looks like. I’ll also be adding the features requested by the users as items to our issue tracking system.


– I completed the librarything tests and committed them along with corresponding files earlier this week.  Nikita has already pointed out the page reformatting check fails on his computer so I need to look into those two functions more this week and come up with a better solution.

– Other classes, projects and tests.
– Ran into some hg issues.  I commented out the reformatting tests earlier today and commit them so test_all.py wouldn’t fail.  I tried to commit them but I kept getting a 403 error when my username and password were correct.  I need to look into this further.


– I got the zombies issue resolved, couple more things I’d like to do on that front though.  Nikita mentioned that one of the integration tests is broken from this update, so I’m going to resolve that tonight or tomorrow.  I’d like to do a little more testing of everything tomorrow if I have time as well.

– Classes/other projects due, job interview.

Posted in ElmCity, Status | Leave a Comment »

My Tools of the Trade

Posted by jerboaa on 2009/10/22

Here’s a list of tools I cast essential for carrying out my day-to-day business or I find otherwise useful.

Hardware (in no particular order)

  • My primary workstation is a HP Pavillion Slimline, a pretty much standard, off-the-shelf PC on which I installed Ubuntu Linux. The hardware works reasonable well under Linux, but at some point I wish hardware vendors would just offer some models which work seamlessly with any major OS. Anyhow, the desktop has 3 GB of RAM and 500 GB hard disk. Nothing really special, it does it’s job and is about a year old, now. It is complemented by a 21” Dell LCD monitor and the other usual peripherals. By the way, once you’ve used laptops for your daily work for so long that you don’t even realize how small your screen actually is, you will appreciate the decently sized monitor of your new desktop.
  • Other than my desktop computer, I use an Asus eeePC netbook when traveling or at my office at work or at university and sometimes my old Toshiba laptop I run Debian on. The netbook has very decent Linux compatible hardware. Mainly because it came with some Linux preinstalled – I guess is was some weird Xandros. That wasn’t quite it for my netbook – I found it very limiting – so I installed Ubuntu on it using the array.org custom kernel and the netbook remix software package. This works reasonable well for me.

Software (in no particular order)

  • Linux (and the various standard *NIX tools): I like open-source and I like to have the opportunity to debug my OS, so I’m using Linux (Debian and Ubuntu) for the most part. I found it frustrating at times when I was a Windows user and something stopped working from one day to the next. Although it was mostly me why things broke, I still had no reasonable way of undoing/fixing things. Well, at least not nearly as nicely as I can fix and analyze things on my Linux boxes. My mind wanders…
  • Gnome Terminal: Bash to be precise. I’m using Bash on a daily basis and I don’t want to live without it anymore. It just helps getting your work done.
  • Screen: A quite handy tool for multiplying your screen when working remotely on a machine via SSH.
  • Mozilla Firefox: The first thing I’m starting once logged on to my computer is a Web browser. No matter if I’m debugging some CSS or asynchronous HTTP request or if I’m just reading my favorite paper, Firefox is the tool of choice.
  • Firebug: Number 1 Web developer tool. I haven’t seen a better tool, yet.
  • Ad-block Plus: The online world is just not bearable without it.
  • It’s all text!: This one is also a quite handy add-on if you were to write text/code in HTML textareas a lot. By using It’s all text! you can load the content of any textarea into your favorite text editor (GVIM in my case), edit it and save it back into the appropriate textarea of the Web page.
  • Vmplayer and qemu: These tools are just nice for the occasional boot into a clean Linux sandbox or testing some IE stuff on Windows. I use qemu to create the bare vmdk disks and use vmplayer to play them. VirtualBox is also a nice alternative.
  • Eclipse (with RadRails, and other plugins): When doing some programming in Java, Python, Rails or C I use Eclipse for the most part augmented with quite a few VIM here and there.
  • VIM: For writing Latex, BASH scripts, code  or for any other use of plain text processing, VIM is my tool of choice.
  • XChat: My preferred X IRC client
  • Evolution: A quite reasonable choice for doing all my email work. I chose Evolution, since it has a calendar integrated, but I’m not sure if Mozilla Thunderbird with Google Calendar wouldn’t work quite as well.
  • Latex: Either for writing articles, assignments, theses. It’s simply a nice layout.
  • Inkscape: Sometimes when there some vector drawings to create (such as the MarkUs logo :-))
  • GIMP: For my very basic image manipulation needs.
  • OpenOffice.org: For the occasional word processing or spreadsheeting.

I think these are (most of) my all-time-favorite tools. What are you using? What do you find helpful?

Here are some links what other people wrote about this topic: Mike Gunderloy and 1, 2, 3, 4, 5, 6, 7, 8, 9, and counting…

Posted in Basie, Eclipse4Edu, Education, ElmCity, Ingres, MarkUs, RoboCup, Thunderbird, WikiDev | Tagged: | 5 Comments »

Punchlines Oct 19-25 ElmCity team

Posted by Nikita Pchelin on 2009/10/21


– The project I was working on got canned, so I’m trying to get reacquainted with the core codebase
– Also trying to understand the intended goals and power structure of this project better so I don’t waste more work. This entails some discussion, new use of the bug tracking system for laying out goals, and group formulation of user stories (to be finished at this Friday’s meeting)

– Midterms
– If we write user stories as a group on Friday without Jon, how do we make sure he clears each one, and what do we do if he rejects many of them? I’m concerned this could make Friday’s meeting less useful than it otherwise would be. Alternatives: Solicit a comprehensive description of what Jon thinks are acceptable high-level goals before Friday and refine at the meeting, adding no additional ideas so we don’t stray from spec, or, better, have Jon come to the meeting if possible.

Next Steps:
– Write documentation
– Write tests, and add any bugs I discover to bug tracker
– Choose all future tasks from the bug tracker for better directed work
– Turn agreed-upon and authorised user stories into a series of milestones specified in the bug tracker, choose one as a milestone for one or two weeks from now
– Use user stories generated plus group discussion to formulate marking scheme

http://elmcity.cloudapp.net/services/frenchThings/html (different types of feeds showing up!)
– removed all relative links in the code, which now allows us to check out code and develop locally much easialy
– added filtering ability to the service (url=…&filter=Montreal)
– rewrote librarything plugin to use RSS instead of HTML when parsing events
– myspace (fixed issue #36, choking on certain pages with missing location information)
– tests: wrote tests and docuemnted myspace, librarything, database.py, added test_all.py (that creates a test suite from available test modules and runs them all)
– wrote initial batch module (batch.py)
– started two wiki pages (http://code.google.com/p/elmcity/w/list) one for “how to write a plugin”, the other one describing the general look of the system.
– other little things

– none
Next Steps:
– looking and coming up with the solution for the timezone problem (as per http://friendfeed.com/elmcity-development/735c49eb/problem-with-librarything-approach) and
AM/PM problem (as per http://friendfeed.com/elmcity-development/abbbced4/paris-est-belle)
– pluginFinder.py, icwWriter – documentation, unit tests

– move views.py code into a separate file, and make views.py call it from there (this is to enforce independence from django)
– extending web-interface
– more things if time permits!

more to come

Posted in ElmCity | Leave a Comment »

Elmcity Status Reports for 2009-10-14

Posted by jorygraham on 2009/10/14

I haven’t heard back from the whole team, but here are the status reports I did receive:



  • Implemented the SQL layer in database.py by…
  • Restructuring and refactoring icsWriter.py and remote.py


  • We don’t have any unit or integration tests, so it’s impossible to be completely reassured with code changes.

Next Steps:

  • Write integration tests for database.py?
  • Write unit tests for remote.py
  • Further restructuring of remote.py



  • Still working on cataloguing what sorts of general patters we should support.  Expect to have results by the end of the week.
  • Setting up Cathy Levinson’s elmcity wordpress plugin (Calendar Display Mechanisms) and investigating how to divide the work between the elmcity service and the plugin.


  • None.

Next Steps:

  • Continue working on the cataloguing of general patterns found within existing FuseCal-parsed pages.  Expect to have results published by the end of the week.
  • Look into how the elmcity wordpress plugin is sourcing JSON and how client-side rendering directly in the elmcity service might be more useful and how it may not be.



  • Wrote remote.py
  • Implemented bad_urls table
  • Wrote up about the project page, with simple instructions on how to use the service
  • Restructured hg repository so that it’s now mirror of our server structure
  • Updated ajax interface & frontend so it doesn’t choke on malformed URLs


  • None

Next Steps:

  • Add filter capability to the service
  • Write-up “how to write a plugin” plugins guidelines
  • Write batch processing module



  • I’m looking into Autopager, a firefox plugin that does a very good job of allowing users to generate rules to select different parts of a page.
  • I’m reading through the source code and sketching out some ideas as to how a plugin generator for us would look, how to make the interface more usable, and what extra features we’d need.


  • Waiting for answer from developer of whether Autopager is free for reuse with modification, or just open source
  • I’m not entirely sure whether this side project is a good use of team resources: Would a complementary generator to Diane’s project be worthwhile?
  • I was trying to wait until I had something more concrete to blog about it, but I could really use some input from the rest of the team

Next Steps:

  • I’ll put together a UCOSP Elmcity ics with the meetings, this weekly post, and other events – if nothing else, it’d help me stay on top of things.
  • I’ll make a blog post about this idea, and solicit opinions from the rest of the group.
  • Search for and blog about the question: what is fair use for open source but not free software? Can I dissect Autopager’s approach in detail, throw away their code, and write from a guideline I got from their source? Can I look through Autopager’s source when I hit a design problem, and implement their solution “in my own words”? Or am I not really allowed to make what would be conceptually a derivitive work if I’ve read their code? I’m sure a lot of people working on UCOSP could help me with this.

Posted in ElmCity, Status | 1 Comment »