- Use Case: Social Media Platform
- Segment: Consumer Reviews
- Products: Morphological Analysis
- Availability: API and SDK
Road to Japan: How to Yelp Like a Native
Whether you live in San Francisco, Boston, Dublin, Vienna, or Tokyo, Yelp has reviews of local businesses in your neighborhood. Yelp’s search engine sorts through various signals and criteria looking for the most helpful and reliable reviews to highlight among the millions of reviews that have been posted to the site in more than a dozen different languages.
When people visit Yelp, seeing reviews that are relevant and informative in turn encourages user engagement. If consumers like what they see (or disagree with what they see), it might spur them to contribute their own thoughts or add missing information.
“When people visit Yelp, seeing reviews that are relevant and informative in turn encourages user engagement.”
The Goal: Succeeding in the Japanese market
Leveraging its success in the U.S. and Europe, Yelp looked to launch a site for Japan, another large market with consumers who are passionate about local businesses from sushi bars to nail salons, contractors to dentists. Up until that point, Yelp had only dealt with European languages. Japanese (as well as Chinese and Korean) is technically challenging as it is not written with space between words, and finding the words is fundamental to searching. Tokenization (word-breaking) for Japanese was a first level requirement.
Group Product Manager of Search and Data Science, Travis B. and his team were at the beginning of a learning curve, grappling with basic questions like “How do we make search work in Japanese?” However, their goal was clear from the outset: To create quality, polished products to local Japanese users that would facilitate creating a community of users from the start.
The Challenges: Search and Review Highlights
The team at Yelp spent a lot of time evaluating several linguistic packages, including Basis Technology’s Rosette text analytics platform, to find something that met their requirements.
- Performance: In new markets, cloud services like Amazon Web Services or data centers might not be close by, which affects performance. Adding linguistic processing to every search query is another hit to performance, and at that point users will start to “feel like a second class citizen.” So, a high-performing linguistics package was a must.
- Expansion Plan: Yelp’s international expansion plan had to be taken into consideration
- Improving Search Features in All Languages: Although the initial target was Japanese, once Travis’ team began its research, they asked themselves if other languages they already supported might benefit from more linguistic processing.
In addition to search, Yelp sought to bring its popular “review highlighting” function to the Japanese Yelp site. This function highlights frequently mentioned ideas (e.g., “Delicious scallion pancakes”) from among many reviews for a business, so that at a glance, a user can grasp the main ideas within a body of reviews. This Yelp feature had been available in English for many years. In user interviews, the feature had been mentioned as particularly useful.
Writing the algorithm that finds these highlights is different than a traditional search problem, but uses a number of similar techniques. In addition to word splitting and sentence splitting, one of the most important features is part-of-speech tagging. For instance one typically wants to surface noun phrases like “maple bacon donuts” as entities, rather than words like “be” or “eat” which are mentioned a lot in reviews, but not particularly helpful to highlight.
“…we got lots of help from Basis to figure out why the way we’d integrated Rosette wasn’t optimal and their support team helped us get Rosette performing the way we wanted.”
Two factors finally led Yelp to choose Rosette.
- Number of Languages Supported: Yelp didn’t see another uniformly high-performing package that offered the functionality they needed in a wide number of languages (40). With one package providing all the linguistic processing across all the languages Yelp needed, their engineers would not have to go through a process of integrating a new package into the code base for every new language—thus shortening time-to-market.
- Product Support: The Yelp team also considered linguistics packages available as part of Lucene or in open source packages.
“These [other] packages had pretty good quality compared to Rosette for certain languages, but we would have had to stitch them together, and we wouldn’t have had someone to turn to when there was a problem,” Travis said. “Basis Technology’s support and guidance was very helpful. There were definitely a couple of instances early on regarding the best way to implement Rosette to optimize performance.
“But we got lots of help from Basis to figure out why the way we’d integrated Rosette wasn’t optimal and their support team helped us get Rosette performing the way we wanted. If we had gone with multiple, different packages, it would have been a lot of debugging work for our engineering team.”
After adding Rosette to Yelp’s search, Travis reported, “We were able to get our internal assessment of search quality to a point where we felt comfortable shipping the product. Metrics measure how users engage with search results, and from human evaluation, the improvement is due to fewer misinterpreted search queries.”
In internationalizing the review highlighting feature in many of Yelp’s supported languages, Yelp tried various things and eventually settled on an approach that leveraged what they had learned from working on the English feature, and used Rosette to help find likely phrases and bring together disparate words with the same meaning. Yelp currently even uses some features from Rosette in the English version of highlights.
“The highlights feature is one that is very easy to make mistakes because you don’t know the language, so we would have been very reticent to ship internationally if we did not have Basis Technology”
“The highlights feature is one that is very easy to make mistakes because you don’t know the language, so we would have been very reticent to ship internationally if we did not have Basis Technology,” Travis said. “If you look at the two areas we were focusing on—search (a core function) and highlights—both are helping people to find the really insightful and important info from reviews. Highlights is incredibly useful to users and helps them make a quick decision and get connected to a business.
“In both cases, if we misunderstand the language, and ship core functionality that reads as broken in a local language—it makes us look like a company that doesn’t understand the local customs and languages and makes it hard to build a community of local users. We rely on local people who are passionate about their local businesses, so it’s doubly important to get these right.”
Yelp Japan has been well-received by the locals since it launched in April 2014, and has been one of Yelp’s fastest growing international markets. Since then Yelp has used Rosette in many languages to launch Yelp search and Yelp highlights functionality.