I spent my first 9 months at Hebbia working with business leaders, tech leaders, and junior users to understand search patterns, and how to build a productivity tool with the most utility.
Over time it became clear that in 5 out of 6 cases semantic search fails.
[And at Hebbia, we don’t just build semantic search. We’ve come up with different tools for all of the times where semantic search fails…]
Case 1: Users are looking for proper nouns or specific terms
- If I am looking to understand “what does Hebbia do”, I am not interested in results about Joe Gebbia, the internet entrepreneur. Those are the results that a semantic search engine alone would give me.
- When a user is looking for a proper noun or specific term, they actually want to run both a keyword, and a semantic search to find results related to “Hebbia” with an understanding that “what they do” is related to the core products or services
Case 2: Users are looking for a filtered list (using metadata), not a search result at all
- If a user is looking for a list of companies in their portfolio with exposure to SVB — they don’t want to run a semantic search; they are looking to filter to all documents where SVB is the loan issuer, or where cash deposits are held with SVB.
- Users are looking to dynamically extract metadata, and do a search over that metadata
Case 3: Users are looking for some kind of inference or multi-hop reasoning
- “Customer concentration” is one of the most common cases of this. Very rarely are users looking for a passage that contains something semantically similar to “semantic search”
- Users are actually looking for all customer contracts, the ability to view them by size, and then analyze to find the % of revenue the top customers comprise
Case 4: Users are looking for a financial or numerical information
- When looking for a specific financial number or figure, users are typically not looking for similar figures, they are typically looking for that figure specifically.
- Again, this will require some combination of semantic search, exact match, and sometimes additional metadata filtering
Case 5: Users are looking for an answer that is not contained in the document set
- If the answer doesn’t live in the documents, semantic search will still show you something it deems “similar”
- In order to detect if an answer is not present, this typically requires some combination of search and generative AI for reasoning and inference
Case 6: Users are looking to do a true semantic search
- Of course in this case, 1 in 5 times — semantic search alone gets users to exactly what they need. E.g. When they are looking for key risks, it is helpful that they are able to surface threats or related terms. This is especially effective in broad thematic research.
Succeeding on any of the above cases in a way that drives meaningful value relies on parsing complex and lengthy documents, including charts and tables, from many sources and integrations.
And for all of the above reasons and more — we aren’t just building “semantic search” at Hebbia. Semantic search is a piece of what we do, but the other 80% is building the future of AI Native productivity tools.
We’ll talk about this more in another blog.
Until next time,