Get instant explanation for any acronym or abbreviation that hits you anywhere on the web. It was estimated that one-third of biological terms are variants [9]. 10.1186/gb-2008-9-s2-s3, Piao S:A Highly Accurate Sentence and Paragraph Breaker. "CA 5 gene" are transformed into "CA gene"); Same as 5 but reversing both the SF and LF (e.g. A.W. 10.1197/jamia.M1139, Chang JT: Using machine learning to extract drug and gene relationships from text. Online services include (1) SF-LF Search Service and (2) SF-LF Identification Service, whereas off-line tools include (1) an off-line abbreviation recognition tool and (2) an abstract fetching script. On our corpus, our system achieved F-score of 86.20% with 93.52% precision at 79.95% recall. Sarah Vinz. The range of oscillation also decreases when the size of training data increases. In addition, the difference between our system and those of [26, 28] is that we can identify pairs with unused characters in the SF. The abbreviation of the journal title "Scientific study of literature" is "Sci. May 6, 2022. Third, to prune off other characters. of service and privacy policy. The results are shown in Table 5. Due to the rapid increase of biomedical articles, the throughput of an AR system is important for dealing with large quantities of articles. By clicking "Log In", you agree to our terms Potential SFs which do not contain any alphabetic character or contain certain symbols ("=", "%", ">" and "<") were excluded. 10.1038/nrg1768, CAS Hence, we denote this annotated corpus as "BIOADI corpus." However, we kept them and identified them with model prediction correctly. Although it is not possible to draw any conclusions with regards to research trends by an analysis of common abbreviations, a number of interesting observations can be made. Proux D, Rechenmann F, Julliard L, Pillet V, Jacq B: Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. For example, our system can identify: To construct a comprehensive biological abbreviation dictionary, we identified SF-LF pairs from 17,551,165 PubMed abstracts. 10.1197/jamia.M0913. Military Abbreviations and Acronyms of the US Armed Forces. After receiving the submission of inputs, the system will return identified SF-LF pairs and scores for each pairs. PubMed Diana from A Research Guide Don't know how to start your paper? This suggests that a machine learning approach to abbreviation recognition gives not only good performance as good as a rule-based system, but also satisfying execution. Proceedings of 8th Canadian Conference on Artificial Intelligence (AI'2005), Volume LNCS 2005, 3501: 319329. Does the LF share the same numbers of the SF? Also, great post to read aboutUS and Canada Map. Abbreviation recognition (AR) is related to NR and can be considered as a pair recognition task of a terminology (may be a phrase or an entity) and its corresponding abbreviation from free text. BMC Bioinformatics 10, S7 (2009). is written as BA. Examples: = Im grinning, IMHO = in my humble opinion, FYI = for your information, FWIW = for what its worth, ROTFL = rolling on the floor laughing, WTG = way to go, Emotions: or = winking, or = laughing. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. acad., bib., misc. Privacy Table 3 presents the performance of four trials on different combinations of four sets of features on both corpora. 2002. Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Republic of China, Cheng-Ju Kuo,Kuan-Ting Lin&Chun-Nan Hsu, School of Chemical and Life Sciences, Singapore Polytechnic, Republic of Singapore, Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan, Republic of China, Department of Zoology, The University of Melbourne, Parkville, Victoria, Australia, You can also search for this author in We annotated a corpus of 1200 abstracts from BioCreative II gene normalization dataset [32] by a single person for consistency and exploited it to develop an AR system. Abbreviations (including acronyms) are heavily used in legal writing. As the precision of logistic regression being higher than SVM with RBF kernel, the logistic regression algorithm was used to develop our AR system. The total number of each of set of features and the total number of all features are listed in Table 1. Both positive and negative instances were required for model training. Solving the problem of NR will allow for more complex text mining tasks to be addressed [4] as it is a prerequisite for information extraction and advanced text mining [3, 5, 6]. This is because while Dr. and Oct. are general abbreviations, whos and cant are contractions and DNA, WHO, and US are acronyms. The fastest system on all corpus size was achieved by Schwartz's system. This is followed by matching the converted string to a specified pattern. New York, NY, USA: The MIT Press; 2001., 10: [http://dx.doi.org/10.1017/S1351324904213432], Chang JT, Schtze H, Altman RB: GAPSCORE: finding gene and protein names one word at a time. Genome Inform Ser Workshop Genome Inform 1998, 9: 7280. Also,great post to read concerning Abbreviations for Books of the Bible. One of the main reasons of the challenging is high variation of terms that are not explicitly reflected in biomedical ontologies [7]. Many species of primates, such as orangutans, are endangered. Cheng-Ju Kuo and Maurice HT Ling developed methods, annotated the corpus, implemented the offline software, and drafted the manuscript. We used space and punctuations as delimiters to tokenize each potential LF into tokens. IEEE Computer Society Conference on Bioinformatics 2002. By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI, the most comprehensive dictionary of biological abbreviations online. Google Scholar, Jensen LJ, Saric J, Bork P: Literature mining for the biologist: from information retrieval to biological discovery. Google Scholar, Hearst MA, Altman RB, Schwartz AS, Bhalotia G, Oliver DE: Tools for loading MEDLINE into a local relational database. Common Abbreviations Used in International Narcotics Control Strategy Report. More recently, Sohn et al. Each system was trained with the AB3P corpus before tested them with the BIOADI corpus and vise versa.
The null hypothesis is that our system and another system performs equally well. BMC Bioinformatics 2004, 5: 146+. The synonym pairs were not considered as valid SF-LF pairs and ignored in the following experiments. If the input is a PubMed ID, the result table will also show a hyperlink to PubMed at the bottom. 10.1093/bioinformatics/btg010, Finkel J, Dingare S, Manning C, Nissim M, Alex B: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. To test the performance of different learning algorithms in our feature set, we implemented four learning algorithms, including Support Vector Machine, Nave Bayes, Logistic Regression and Monte-Carlo Sampling Logistic Regression. Cite this article. Schwartz A, Hearst M: A simple algorithm for identifying abbreviation definitions in biomedical texts. 10.1038/88213, Krallinger M, Leitner F, Penagos CR, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. orangutans, are endangered. By using the search service, users can find different subtypes of SFs or LFs and thereby come upon extra PubMed IDs that they can not find through regular literature search. https://doi.org/10.1186/1471-2105-10-S15-S7, DOI: https://doi.org/10.1186/1471-2105-10-S15-S7. Those tokens acted as binary features respectively. Secondly, "SF-LF Identification service" provides real-time AR service. BMC Bioinformatics 2005, 6(Suppl 1):S7. Medstract Gold Standard Evaluation Corpus for evaluation [30] was not used as past results with the corpus reported are all based on the different modification version annotated by each team [29]. 10.1093/bioinformatics/btn183, PubMed Central We exploited this set of features to describe the mapping of SF letters to LF letters and the calculation of the character usage between SF and LF. In this context, always write out the full words instead. Do not introduce an acronym unless you will use it a minimum of three or four times. Hatzivassiloglou V, Duboue PA, Rzhetsky A: Disambiguating proteins, genes, and RNA in text: a machine learning approach. We had also used some features to demonstrate the position and amount of stop words in LFs. 10.1186/1471-2105-8-S9-S5, Morgan A, Lu Z, Wang X, Cohen A, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu HH, Torres R, Krauthammer M, Lau W, Liu H, Hsu CN, Schuemie M, Cohen BK, Hirschman L: Overview of BioCreative II gene normalization. Instead, put them inside parentheses followed by a comma, or write out full words. The highest precision on both corpora were achieved by Sohn's system, but the highest F-score and the highest recall on both corpora were achieved by our system. The F-score difference among the learning algorithms trained using the AB3P corpus was larger than using the BIOADI corpus suggesting that pairs of SF and LF were more irregular in the AB3P corpus (containing synonyms) than in the BIOADI corpus. For example, building a term index of a text database to retrieve articles of related interests [17] or to link text-mined protein interaction networks [1820]. 10.1016/j.jbi.2004.08.004. Some of them are synonym pairs, however. 10.1186/1471-2105-6-S1-S7, Chang JT, Schtze H, Altman RB: Creating an online dictionary of abbreviations from MEDLINE. BMC Bioinformatics 2008, 9: 402+. PubMed For example, B.A. This work is supported in part by the National Research Program in Genomic Medicine (NRPGM), NSC, Taiwan, under Grant No. Krauthammer M, Nenadic G: Term identification in the biomedical literature. In addition, to test the influence of data size in a large set of training data, we merged both corpora to form a new dataset which contains 2450 unique abstracts. In this study, annotated SF-LF pairs were used as positive instances in training data, and negative instances were automatically extracted from text. Before training and testing the model, it is a pre-requisite applying to transform the pair into the form of a feature vector. Study Lit.". Many species of primates (e.g., orangutans) are endangered. For example, PTEN and MMAC1 refers to the same entity [8]. Article CAS Genome Biol 2008, 9(Suppl 2):S4. Periods should always be used with Latin abbreviations, but not with contractions or acronyms. {"@context":"http://schema.org","@type":"Periodical","issn":[],"name":"Scientific study of literature","mainEntityOfPage":{"@type":"WebPage","@id":"https://stage.paperpile.com/n/scientific-study-of-literature-abbreviation"}}. Bioinformatics 2004, 20(2):216225. Proceedings of the Pacific Symposium on Biocomputing 2003, 451462. [http://citeseer.ist.psu.edu/article/pustejovsky01extraction.html]. Four learning algorithms, Logistic Regression, Monte-Carlo Sampling Maximum Entropy, Support Vector Machine and Nave Bayes, were tested. If not, the pairs acted as negative instances in training data. We were interested in the influence of training data size to the performance. It indicates that training with a consistently annotated corpus (i.e., BIOADI corpus) is useful to improve AR performance. In addition, ELISA is a common serological technique for detecting anti-viral antibodies suggesting viral infection, including that of HIV. 2008. Hence, our results suggested that our system is an statistically significant improvement over Schwartz's and Sohn's systems. Hence, it is not surprising to see a large occurrence of these terms in the literature. (HSP, heat shock protein)". Acronym Search. The p-values was less than 0.001, rejecting the null hypothesis. The performances on AB3P corpus were between 85.03% and 89.90%. At first, the performance was not much different among the systems. These results suggested that our system outperformed Schwartz's and Sohn's systems. For example, APC can have many different meanings, as illustrated in Table 8. To see in which PubMed IDs the SF-LF pair can be found, users can click on the document picture under the "PubMed" column to generate a "PubMed ID box." https://creativecommons.org/licenses/by/2.0 To ensure the stability of our web site, all scripts and layout of the web site have passed tests on different browsers, different platforms, and even mobile devices. The five tests used five different test data which consisted of 600 randomly selected abstracts. Therefore, we expect that its F-score can reach about 90%. Each abstract was split into sentences by "sentence and paragraph breaker" [33] before the automatic AR process. However, the list is small. We're doing our best to make sure our content is useful, accurate and safe.If by any chance you spot an inappropriate comment while navigating through our website please use this form to let us know, and we'll take care of it shortly. We use cookies to give you the best experience possible. Copyright 2010 - 2019A Research Guide. Many species of primates, e.g. also used brackets to initiate the process of AR but ignored a list of common bracket-delimited structures, such as "(p < 0.05)". Figure 1 shows the results of three AR systems tested on the BIOADI corpus. "CA 5 gene" are transformed into "eneg AC"); We generated contextual information of each potential SF-LF pair from the tokens which precede the SF-LF pair and are limited two tokens at most. An abbreviation is a short form of a word or phrase that is usually made by deleting certain letters. The higher the score is, the better the identification can be trusted. Other groups had attempted combinations of approaches to improve precision [1216]. Our approach to AR is based on machine-learning and exploits a novel set of rich features to describe properties of a potential SF-LF pair. The test results are shown in Table 6. BMC Bioinformatics Figure 1, 2 and 3 shows the results on the two corpora and merged corpus (BIOADI corpus + AB3P corpus). In order to construct features from raw data (potential SF-LF pairs extracted from the previous step), we defined four sets of features. J. Gould and D. M. Lewis, W. M. Lindsay's second edn. We focus on the following forms of SF-LF pairs: LF is in front of SF, and SF is in brackets or square brackets, e.g. In the figure of recall versus training data size, the recall of our system is higher than other systems' recall even when the system was trained by a small size of training data. The design of these features was originated from [16], inspired by the previous works [29, 34] and carefully selected in our tests. 2022 BioMed Central Ltd unless otherwise stated. We followed the style and the annotation guideline of AB3P corpus [28], in which SF and LF pairs are separated by "|" (for example, "HSP" and "heat shock protein" form "HSP|heat shock protein") to annotate each abstract. Kuan-Ting Lin developed the online interface and revising the manuscript. Not all of the abbreviations used in this example have the same look and feel. "HSP (heat shock protein)"; SF is in front of LF, and LF is in brackets or square brackets, e.g. In general, its best to avoid using these abbreviations in the main text, especially in US English. STANDS4 LLC, 2022. "protein kinase C" forms "PKC"); The size of sharing character set between the SF and the LF divided by the size of character set of the SF; The size of character set of the SF divided by the SF length (in characters); The shortest LF of the SF-LF pair extracted by Schwartz's AR system [26] that is equal to the LF; Same as 5 but ignoring numbers of the SF and LF (e.g. PubMed A. W. Gomme, A. Andrewes, and K. J. Dover, (On the Nature of the Child), see Herzog-Schmidt; see also Schanz-Hosius. HyperWar World War II on the WorldWideWeb Abbreviations, Acronyms, Codewords, Terms. Correspondence to Pacific Symposium on Biocomputing 2003, 403414. volume10, Articlenumber:S7 (2009) Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. It considered two cases - the LF is in the brackets or the SF is in the brackets. Does the string contain certain punctuation symbol? That suggests our feature set is robust and reliable. Sohn et al. Query results are listed as 20 records per page and ordered by the number of PubMed IDs of each pairs so that users can easily find out the most popular ones.
the name of an organization). Odds ratio (OR) is a commonly reported statistical parameter in epidemiology and since medical conditions appears to dominate this list, it is not surprising to find OR here as well. The SF-LF pairs adhered to one of these forms will be annotated as potential SF-LF pairs. The table below outlines in detail the ISO 4 rules and matches to the ISSN maintained list of title word abbreviations (TWA) to derive the abbreviation. Your source for acronyms and abbreviations. Are all characters of the string all lowercase? BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature, https://doi.org/10.1186/1471-2105-10-S15-S7, Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, http://bioagent.iis.sinica.edu.tw/BIOADI/, http://www.biomedcentral.com/1471-2164/10?issue=S3, http://dx.doi.org/10.1017/S1351324904213432, http://view.ncbi.nlm.nih.gov/pubmed/11604766, http://text0.mib.man.ac.uk:8080/scottpiao/sent_detector, http://www.ibm.com/developerworks/java/library/j-seqalign/index.html, http://www.csie.ntu.edu.tw/~cjlin/libsvm/, http://citeseer.ist.psu.edu/article/pustejovsky01extraction.html, http://www.biomedcentral.com/1471-2105/10?issue=S15, https://creativecommons.org/licenses/by/2.0. Common Abbreviations from U.S. Department of State. ", "and") were filtered out. [Adviser-Russ Altman] [Adviser-Russ Altman], Yu H, Hripcsak G, Friedman C: Mapping Abbreviations to Full Forms in Biomedical Articles. The authors declare that they have no competing interests. AbbRE [25] and the system by Schwartz et al. The F-score of our system was three percent higher than Schwartz's system on the AB3P corpus. The detail is as the following: String morphological features of SF and LF. Hence, it seems plausible to use AR as a first-pass in NER. Emotions: or = Smiling (Happy), or = Frowning (Sad), = Shouting, xxooxxoo = Love (or hugs) & kisses. In this manuscript, we denote "LF" to mean "the long form of the term" and "SF" to mean "the abbreviation or the short form of the term". Manage cookies/Do not sell my data we use in the preference centre. Despite the contextual complexity, the extracted LF-SF pairs may be used to support future research, such as the development of named entity recognition systems or abbreviation disambiguation. 10.1186/1471-2105-9-S3-S11, Zhou G, Shen D, Zhang J, Su J, Tan S: Recognition of protein/gene names from text using an ensemble of classifiers. Common Abbreviations and Acronynms from AllEarsNet.com Debs Unofficial Walt Disney World Information Guide. B. Hainsworth. [28] used a LF to SF matching algorithm similar to Yu et al. N. G. L. Hammond and H. H. Scullard (eds. Activated protein C is an important component in blood clotting pathway and may be interacting with warfarin, a widely used drug to manage deep-vein thrombosis and widely known for its extensive interactions with other medical drugs. Jenssen TK, Laegreid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. At the same time, our system runs sufficiently fast to handle the entire set of PubMed abstracts. Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. For example, Stanford University's Abbreviation Server [23, 24] demonstrated 97% precision at 22% recall and 95% precision at 75% recall. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Chang CC, Lin CJ:LIBSVM: a library for support vector machines. If you are following the APA style guidelines, there are some specific guidelines for certain types of abbreviation. August 1, 2015 Deleted letters are replaced by an apostrophe. Comparing to existing available AR systems [26, 28], our system outperformed them on both corpora and performs about 14 times faster than the best AR performance system [28]. Bioinformatics 2003, 19(3):402407. Web. ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. PubMed Bioinformatics 2004, 20(7):527533. BMC Bioinformatics 2005, 6(Suppl 1):S2. The test platform is on a computer with Intel Core Quad CPU 2.4 GHz, 5 gigabytes of RAM and 32bit Linux system. Antigen-presenting cells is widely known to be an important key to acquire immunity. Thereafter, you can stick to using the acronym. Note that when introducing an acronym, the full term should only be capitalized if it is a proper noun (e.g. The number of characters of longest common subsequence of the SF-LF pair divided by the SF length (in characters) [35]; Same as 1 but with the string consisting of the first character of all LF tokens (e.g. Comparing the trials with the highest F-score with the lowest one on both corpora, the trails with all features were four to five percent higher than the one with only morphological set of features. PhD thesis, Stanford, CA, USA 2004. Meanwhile, We also used the AB3P corpus for performance evaluation. Note: When documenting sources using MLA style, the normal punctuation is omitted for degrees when used in parentheses, tables, works cited, footnotes, endnotes, etc. 10.1093/bioinformatics/btg393, Settles B: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Performance versus training data size tested on the BIOADI corpus.