IBM Releases Deep Search For Scientific Discovery
Written by Nikos Vaggalis   
Tuesday, 16 August 2022

IBM's Deep Search for Scientific Discovery (DS4SD) Toolkit has been made available to the public. It comes from the depths of IBM's research labs using NLP to analyze mass amounts of data.

Deep Search is a cloud-based AI research service offered as a SaaS that allows researchers to load large amounts of structured or unstructured data to immediately find useful connections. The sources that Deep Search can consume vary and range from journal articles to patents to technical reports and more. By using AI and NLP it can ingest 20 pages per second whereas a typical human expert takes 1–2 minutes per page just to read, and automatically extracts the semantic units and their relationships. It then builds a searchable knowledge graph which enables its users to:

robustly explore information extracted from tens of thousands of documents without having to read a single paper.

As such it has been widely adopted in the scientific field, for instance on Covid research or for alternative cancer treatments by working out the connections between individual research papers, or discovering new molecules. Of course, the use cases are not constrained to the medical research sector but can be applied anywhere there is data like documents, legal briefs, financial statements, technical specifications, research papers, slide decks, you name it.


IBM has made available part of the service in the form of a toolbox , calling it Deep Search for Scientific Discovery (DS4SD). This toolbox is broken down into two parts, Deep Search Experience and Deep Search Toolkit.

The Deep Search Experience is the automatic document conversion service which allows users to upload documents to inspect a document’s conversion quality, using a simple drag-and-drop interface that makes it very easy for non-experts to use. This part is not open sourced but has been made publicly available online for anyone to use. To work with the Deep Search Experience service,you upload your document and then let it work its magic:

  • Inspects the data that can be extracted from one of your documents. Your document is decomposed on the spot, cut into pieces of text, images, and tables. Numeric data, entities, and their relationships are then inferred from these pieces.
  • Searches and collectes data from preprocessed document collections. These data include structured text, numerics, entities, and their relationships.
  • Processes data into usable information in your workspace , where you connect documents with curated knowledge from databases. The resulting knowledge graphs enable queries and analyses that span the entities and relationships that are described in both your documents and domain-specific databases.

The Deep Search toolkit, on the other hand, is an open source Python package allowing users to interact with the Deep Search platform by programmatically uploading and converting documents in bulk. They can point to a folder and direct the toolkit to upload the documents, convert them, and ultimately analyze the contents of the text, tables, and figures. The Deep Search Toolkit is available as a PyPI package. It can be installed using the standard Python package managers like pippoetry, etc.

The Deep Search Experience is reachable at 

while you can find the Python DeepSearch Toolkit on its repo.

The wider context is that we are entering an era where AI evolution and advancements in Computer Science will play a crucial role in bringing society forward.That's the one ingredient necessary for success; the other is the democratization by open sourcing those tools in order to make them available to as many brains as possible, increasing multi-fold the chances of making a groundbreaking discovery and so changing the world for the better.


More Information

Related Articles

Artificial Intelligence, Machine Learning and Society

Take Stanford's Natural Language Understanding For Free

Take Stanford's Natural Language Processing with Deep Learning For Free


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Surveying Software Supply Chain Security

Chainguard, the co-creator of Sigstore, has conducted a survey to better understand if and how software supply best practices
are utilized by the industry. We take a look at the findings.

Google's Large Language Model Takes Control

of a robot. No it isn't Skynet just yet, but it is looking a more likely scenario. Until recently I thought that much of the hype about large language models was just that - hype. Now I'm not so sure. [ ... ]

More News





or email your comment to:

Last Updated ( Tuesday, 16 August 2022 )