BERN2 Documentation

In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g., diseases and chemicals) from the ever-growing biomedical literature. We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool (Kim et al., 2019) by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. See our paper for more details.

Plain Text as Input

HTTP method

POST

Request URL

http://bern2.korea.ac.kr/plain

Request Body

{
    "text":"Autophagy maintains tumour growth through circulating arginine."
}

Response

If the input text is annotated successfully, the response is a 200 OK status code. The response body contains a JSON representation of the annotations.
{
    "annotations": [
        {
            "id": [
                "MESH:D009369"
            ],
            "is_neural_normalized": false,
            "prob": 0.9999922513961792,
            "mention": "tumour",
            "obj": "disease",
            "span": {
                "begin": 20,
                "end": 26
            }
        },
        {
            "id": [
                "MESH:D001120"
            ],
            "is_neural_normalized": false,
            "prob": 0.9819278717041016,
            "mention": "arginine",
            "obj": "drug",
            "span": {
                "begin": 54,
                "end": 62
            }
        }
    ],
    "text": "Autophagy maintains tumour growth through circulating arginine.",
    "timestamp": "Thu Dec 23 04:12:28 +0000 2021"
}

Curl command

$ curl -d '{"text":"Autophagy maintains tumour growth through circulating arginine."}' \
-H "Content-Type: application/json" \
-X POST http://bern2.korea.ac.kr/plain

Python example

import requests

def query_plain(text, url="http://bern2.korea.ac.kr/plain"):
    return requests.post(url, json={'text': text}).json()

if __name__ == '__main__':
    text = "Autophagy maintains tumour growth through circulating arginine."
    print(query_plain(text))

PubMed ID (PMID) as Input

HTTP method

GET

Request URL

http://bern2.korea.ac.kr/pubmed/30429607,29446767

Request Body

None

Response

If the PubMed articles are annotated successfully, the response is a 200 OK status code. The response body contains a JSON representation of the annotations.
[
    {
        "pmid": "30429607",
        "annotations": [
            {
                "id": [
                    "MESH:D009369"
                ],
                "is_neural_normalized": false,
                "prob": 0.9999922513961792,
                "mention": "tumour",
                "obj": "disease",
                "span": {
                    "begin": 20,
                    "end": 26
                }
            },
            ...
        ],
        "text": "Autophagy maintains tumour growth through circulating arginine. ...",
        "timestamp": "Thu Dec 23 05:17:50 +0000 2021"
    },
    {
        "pmid": "29446767",
        "annotations": [
            {
                "id": [
                    "MESH:C567763"
                ],
                "is_neural_normalized": false,
                "prob": 0.9999992847442627,
                "mention": "CLAPO syndrome",
                "obj": "disease",
                "span": {
                    "begin": 0,
                    "end": 14
                }
            },
            ...
        ],
        "text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. ...",
        "timestamp": "Thu Dec 23 05:17:51 +0000 2021"
    }
]

Curl command

$ curl -H "Content-Type: application/json" \
-X GET http://bern2.korea.ac.kr/pubmed/30429607,29446767

Python example

import requests

def query_pmid(pmids, url="http://bern2.korea.ac.kr/pubmed"):
    return requests.get(url + "/" + ",".join(pmids)).json()

if __name__ == '__main__':
    pmids = ["30429607", "29446767"]
    print(query_pmid(pmids))

Installing BERN2

You first need to install BERN2 and its dependencies.
# Install torch with conda (please check your CUDA version)
conda create -n bern2 python=3.7
conda activate bern2
conda install pytorch==1.9.0 cudatoolkit=10.2 -c pytorch
conda install faiss-gpu libfaiss-avx2 -c conda-forge

# Check if cuda is available
python -c "import torch;print(torch.cuda.is_available())"

# Install BERN2
git clone git@github.com:mjeensung/bern2.git
cd bern2
pip install -r requirements.txt
(Optional) If you want to use mongodb as a caching database, you need to install and run it.
# https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/#install-mongodb-community-edition-using-deb-packages
sudo systemctl start mongod
sudo systemctl status mongod
Then, you need to download resources (e.g., external modules or dictionaries) for running BERN2. Note that you will need 70GB of free disk space.
wget http://nlp.dmis.korea.edu/projects/bern2-sung-et-al-2022/resources_v1.1.b.tar.gz
tar -zxvf resources_v1.1.b.tar.gz
rm -rf resources_v1.1.b.tar.gz

# (For Linux/MacOS Users) install CRF 
cd resources/GNormPlusJava
tar -zxvf CRF++-0.58.tar.gz
mv CRF++-0.58 CRF
cd CRF
./configure --prefix="$HOME"
make
make install
cd ../../..

# (For Windows Users) install CRF 
cd resources/GNormPlusJava
unzip -zxvf CRF++-0.58.zip
mv CRF++-0.58 CRF
cd ../..

Running BERN2

The following command runs BERN2.
export CUDA_VISIBLE_DEVICES=0
cd scripts
bash run_bern2.sh
(Optional) To restart BERN2, you need to run the following commands.
export CUDA_VISIBLE_DEVICES=0
cd scripts
bash stop_bern2.sh
bash start_bern2.sh

Use BERN2

After successfully running BERN2 in your local environment, you can access it via RESTful API. Except for the url (use http://localhost:8888 instead of http://bern2.korea.ac.kr), the usage of the local installation is exactly the same as that of the web service.

Plain Text as Input

import requests

def query_plain(text, url="http://localhost:8888/plain"):
    return requests.post(url, json={'text': text}).json()

if __name__ == '__main__':
    text = "Autophagy maintains tumour growth through circulating arginine."
    print(query_plain(text))

PubMed ID (PMID) as Input

import requests

def query_pmid(pmids, url="http://localhost:8888/pubmed"):
    return requests.get(url + "/" + ",".join(pmids)).json()

if __name__ == '__main__':
    pmids = ["30429607", "29446767"]
    print(query_pmid(pmids))

If you have any questions or have found a bug, please contact mujeensung@korea.ac.kr or minbyuljeong@korea.ac.kr