Under the Hood of JSTOR Snap

Ron Snyder

In this installment of the JSTOR Labs blog we take a long-overdue look under the hood of the JSTOR Snap photo app.

We developed and launched Snap way back in February. If you’ve been following the blog you’ll remember that Snap is a proof of concept mobile app that was developed to test a core hypothesis – that a camera-based content discovery tool would provide value to users and was something they would actually use. Our findings on that question were somewhat mixed back in February. Users were definitely intrigued by the concept but the app hasn’t received a ton of use since. However, user feedback and reactions in demos of the app that we’ve conducted since February continue to suggest there is some real potential here. Where we ultimately go with this is hard to say right now, but the possibilities continue to intrigue both users and us. Additional background on the user testing approach can be found here, including a short “how we did it” video.

In addition to testing the core user hypothesis we also wanted to see what it would take to actually build this thing. While doing this quickly was important (because that’s what we do) we also wanted to see if a solution could be developed that produced quality recommendations with reasonable response times. So it wasn’t just a question of technology plumbing to support user testing. We were really interested in seeing if our topic modeling, key term extraction, and recommender capabilities were up to the task of generating useful recommendations from a phone camera-generated text image.

The technical solution involved three main areas of work – the development of the mobile app itself, the extraction of text from photos, and the generation of recommendations based on concepts inferred from the extracted text I’ll describe the technology approach employed in each of these three areas and share some general findings and impressions of each.

First, the mobile app: this represented Labs first project involving mobile development so we took a fairly simple approach. No native app development for us this time (although we’d briefly considered it). We decided to go with a basic mobile web application, but do so in such a way that it could be transitioned into a hybrid mobile app capable of running natively on the major phone operating systems, if needed. For the web app framework we decided on JQuery Mobile after conducting a quick survey of possible approaches. There were many good candidates to choose from but we ultimately selected JQuery Mobile based on its general popularity (figuring it would be easy to find plugins and get advice) and perceived ease of learning. All-in-all we were satisfied with the JQuery Mobile approach. As we’d hoped, the learning curve was rather modest and the near ubiquity of JQuery made this a good choice for our initial mobile project.

We also performed a couple of quick tests with our JQuery Mobile code and PhoneGap to see what it would take to convert our web app into a hybrid app. PhoneGap is a tool for bundling regular HTML, CSS and Javascript with platform-specific wrappers enabling a web app to run natively in an iOS, Android or Windows mobile environment. I won’t go into detail on this but it was surprisingly easy and the resulting app was actually pretty snappy, if you’ll pardon the pun, on my iPhone 6.

One of the good news / bad news stories in developing for modern smartphones is that the phone cameras today are quite good. Having high quality images to start with was definitely a plus for this app. The downside is that the raw images are huge and can have a significant impact on overall response times as these large images take some time to be transmitted to our back-end server for optical character recognition (OCR) processing. Since we were doing this as a web app we didn’t have access to some of the native phone features that would have enabled us to easily get smaller lower-res versions. We ultimately ended up doing some down-sampling in JavaScript before image transmission. After some trial-and-error we found a sweet spot that optimized for image size and OCR accuracy. The reduction in image size and transmission time was significant.

Going into the project my single biggest worry was whether we’d be able to do on-the-fly OCR processing with acceptable quality and response times. I’d initially considered developing a custom back-end OCR service based on the Tesseract or OCRopus OCR engines. After some initial prototyping it quickly became apparent that this approach, while technically feasible, would take more time and effort to get right than we could afford on this short project. Based on that we decided to go with an on-line service. There were a number of options to choose from here but we ended up going with OCR Web Service. We’ve been very happy with this choice. The OCR accuracy is excellent, response times are relatively good, and the price is quite reasonable. The only real work involved here was the development of a SOAP client for our backend, python/Django-based service to use.

Our last challenge involved the generation of recommendations from the OCR text. For this we first needed to identify the key terms and concepts from the extracted text. This is a two-part process, one involving key term identification using a rule-based tagger that identifies terms from a controlled vocabulary that JSTOR has been developing for a couple years now (more on that can be found here). The second part of this process involved topic inference (using LDA topic models generated from the full JSTOR corpus). The key terms and topics associated with the OCR text were then used to find other documents in JSTOR with similar characteristics.

We haven’t performed any formal testing of the generated recommendations yet, but feedback from users has been pretty good, at least in cases where the input is good. This is a situation where the expression “garbage-in, garbage-out” really applies. If a dark or blurry image is used (or even one with sparse text) the recommendations produced are much less targeted than when we have sharp images with text rich in relevant terms and concepts. I’d encourage you to give it a try for yourself. Go tohttp://labs.jstor.org/snap using your smartphone and try this on some text you’re familiar with. We’d love to hear what you think about the app and the recommendations.