Class search engine for UC Berkeley

The modernized search engine indexes data from at least 7 sources (API) and serves hundreds of thousands of page requests each month with up to the minute enrollment data.

A note regarding our involvement in this project: Philip Van Drunen was the lead developer on this project, contracted through Project Ricochet from 2016 to 2018 and doing business as Journey Multimedia, Inc. Peter Romero was also a prominent developer on this project from 2016 to 2018.

 
Screen Shot 2020-06-22 at 10.31.01 PM.png

Making something easy and intuitive to use is very difficult.

The website, classes.berkeley.edu was designed for students and faculty to search course and class data in a familiar way — similar to Amazon.com. Each semester 6,000 to 16,000 new classes are added to the system — a significant undertaking for a server to parse, let alone a student to search through.

The site, was build on the Drupal Content Management System and utilized 80% open-source modules. Apache SOLR is the backbone of the search index, chosen for it’s speed and reliability. Custom code was written to crawl data from at least 7 unique API’s to provide accurate, up to the minute class and course schedule information.

Technical challenges

The most significant technical hurdle to overcome was keeping all data up-to-date. While class records would only change every 8 hours, enrollment numbers needed to be accurate within 3 minutes. This needed to be done efficiently during active enrollment when traffic to the site can jump by 10x.

Class records are supplemented by data several sources and also must display related records such as the same class, offered by other departments, classes offered at different times, classes by other instructors, class location (building) and department information.

 

Search that exposes more choices

UC Berkeley has over 70 departments and 6000 classes offered each semester — not to mention the individual lecture times, locations and related labs. Each new semester adds up to 16,000 new records to the keyword search index and at least 12 customized facets (categories to filter by). With this number of options, finding the right class at the right time of day can be daunting and incredibly frustrating.

The enrollment department had a vision for an experience that was much more like searching for shoes on Amazon: Start with a keyword, be presented with search results that could be quickly scanned and filtered to narrow down the presented options. The interface needed to be familiar, easy to use and provide appropriate guidance for the user, instead of frustrating and confusing them.

When a user searches by Keyword, results are ordered by relevance to the keyword. When a user searches by department category, results are ordered by class code, the order that a student would expect to find.

Performance

Internet users have come to expect super fast search results which is why site performance is a key to success. Throughout development I stayed focused on ensuring that each new feature was optimized to load quickly.

Some ways that I optimized the site were loading unseen portions of a page after the page has loaded. Rendering the page HTML on the client side so it was possible to change the page layout without reloading the page and rendering autocomplete suggestions without making a server request.

As of 2020, the site averages about 15,000 pages views on any given day. During the month of active enrollment the site averages 60,000 page views, but can jump as high as 200,000 per day.  The site averages 160k users per month and time on site is 8 minutes. 

 
Previous
Previous

Greeley Station Theater Presents