{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T15:58:40Z","timestamp":1778255920530,"version":"3.51.4"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2018,3,7]],"date-time":"2018-03-07T00:00:00Z","timestamp":1520380800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61572139"],"award-info":[{"award-number":["61572139"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31601074"],"award-info":[{"award-number":["31601074"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"MEXT KAKENHI","award":["16H02868"],"award-info":[{"award-number":["16H02868"]}]},{"DOI":"10.13039\/501100001695","name":"JST","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001695","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009025","name":"ACCEL","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100009025","id-type":"DOI","asserted-by":"publisher"}]},{"name":"FiDiPro, Academy of Finland: AIPSE programme"},{"name":"Open Fund of Shanghai Key Laboratory of Intelligent Information Processing","award":["IIPL-2016-005"],"award-info":[{"award-number":["IIPL-2016-005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only &amp;lt;1% of &amp;gt;70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore, homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have &amp;lt;60% sequence identity to proteins with annotations already. Thus, the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>The key of this method is to extract not only homology information but also diverse, deep-rooted information\/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>http:\/\/datamining-iip.fudan.edu.cn\/golabeler.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty130","type":"journal-article","created":{"date-parts":[[2018,3,6]],"date-time":"2018-03-06T15:11:00Z","timestamp":1520349060000},"page":"2465-2473","source":"Crossref","is-referenced-by-count":189,"title":["GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank"],"prefix":"10.1093","volume":"34","author":[{"given":"Ronghui","family":"You","sequence":"first","affiliation":[{"name":"School of Computer Science and Shanghai Key Lab of Intelligent Information Processing"},{"name":"Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China"}]},{"given":"Zihan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Shanghai Key Lab of Intelligent Information Processing"},{"name":"Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China"}]},{"given":"Yi","family":"Xiong","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics and Biostatistics, Shanghai Jiaotong University, Shanghai, China"}]},{"given":"Fengzhu","family":"Sun","sequence":"additional","affiliation":[{"name":"Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China"},{"name":"Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA"}]},{"given":"Hiroshi","family":"Mamitsuka","sequence":"additional","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture, Japan"},{"name":"Department of Computer Science, Aalto University, Helsinki, Finland"}]},{"given":"Shanfeng","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Shanghai Key Lab of Intelligent Information Processing"},{"name":"Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China"}]}],"member":"286","published-online":{"date-parts":[[2018,3,7]]},"reference":[{"key":"2023012713015807300_bty130-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023012713015807300_bty130-B3","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/978-1-4939-3167-5_2","volume-title":"Plant Bioinformatics: Methods and Protocols","author":"Boutet","year":"2016"},{"key":"2023012713015807300_bty130-B4","author":"Chen","year":"2016"},{"key":"2023012713015807300_bty130-B5","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B6","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-14-S3-S1","article-title":"Protein function prediction by massive integration of evolutionary analyses and multiple data sources","volume":"14","author":"Cozzetto","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012713015807300_bty130-B7","doi-asserted-by":"crossref","first-page":"3460","DOI":"10.1093\/bioinformatics\/btv398","article-title":"Functional classification of CATH superfamilies: a domain-based approach for protein function annotation","volume":"31","author":"Das","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B8","first-page":"D427","article-title":"SUPERFAMILY 1.75 including a domain-centric gene ontology method","volume":"39 (Suppl. 1)","author":"de Lima","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B9","doi-asserted-by":"crossref","first-page":"S15.","DOI":"10.1186\/1471-2105-14-S3-S15","article-title":"Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (cafa)","volume":"14","author":"Gillis","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012713015807300_bty130-B10","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.ymeth.2015.08.009","article-title":"GoFDR: a sequence alignment based method for predicting protein functions","volume":"93","author":"Gong","year":"2016","journal-title":"Methods"},{"key":"2023012713015807300_bty130-B11","doi-asserted-by":"crossref","first-page":"S7.","DOI":"10.1186\/1471-2105-14-S3-S7","article-title":"Homology-based inference sets the bar high for protein function prediction","volume":"14","author":"Hamp","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012713015807300_bty130-B12","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B13","doi-asserted-by":"crossref","first-page":"i609","DOI":"10.1093\/bioinformatics\/btu472","article-title":"The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective","volume":"30","author":"Jiang","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B14","doi-asserted-by":"crossref","first-page":"184.","DOI":"10.1186\/s13059-016-1037-6","article-title":"An expanded evaluation of protein function prediction methods shows an improvement in accuracy","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2023012713015807300_bty130-B15","doi-asserted-by":"crossref","first-page":"43.","DOI":"10.1186\/s13742-015-0083-4","article-title":"The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches","volume":"4","author":"Khan","year":"2015","journal-title":"GigaScience"},{"key":"2023012713015807300_bty130-B16","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/1471-2105-14-S3-S8","article-title":"Ms-knn: protein function prediction by integrating multiple data sources","volume":"14 (Suppl. 3)","author":"Lan","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012713015807300_bty130-B17","doi-asserted-by":"crossref","first-page":"1854","DOI":"10.1587\/transinf.E94.D.1854","article-title":"A short introduction to learning to rank","volume":"E94-D","author":"Li","year":"2011","journal-title":"IEICE Trans"},{"key":"2023012713015807300_bty130-B18","doi-asserted-by":"crossref","first-page":"i339","DOI":"10.1093\/bioinformatics\/btv237","article-title":"MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B19","doi-asserted-by":"crossref","first-page":"685.","DOI":"10.1093\/bib\/bbt041","article-title":"Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks","volume":"15","author":"Ma","year":"2014","journal-title":"Brief. Bioinformatics"},{"key":"2023012713015807300_bty130-B20","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gku1221","article-title":"CDD: nCBI\u2019s conserved domain database","volume":"43","author":"Marchler-Bauer","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B21","doi-asserted-by":"crossref","first-page":"D213","DOI":"10.1093\/nar\/gku1243","article-title":"The InterPro protein families database: the classification resource after 15 years","volume":"43","author":"Mitchell","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B22","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023012713015807300_bty130-B23","doi-asserted-by":"crossref","first-page":"3429","DOI":"10.1093\/bioinformatics\/btv345","article-title":"ProFET: feature engineering captures high-level protein functions","volume":"31","author":"Ofer","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B24","doi-asserted-by":"crossref","first-page":"i70","DOI":"10.1093\/bioinformatics\/btw294","article-title":"DeepMeSH: deep semantic representation for improving large-scale MeSH indexing","volume":"32","author":"Peng","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B25","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012713015807300_bty130-B26","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/978-3-319-41279-5_7","volume-title":"Big Data Analytics in Genomics","author":"Shehu","year":"2016","edition":"1st edn."},{"key":"2023012713015807300_bty130-B27","doi-asserted-by":"crossref","first-page":"D376","DOI":"10.1093\/nar\/gku947","article-title":"CATH: comprehensive structural and functional annotations for genome sequences","volume":"43","author":"Sillitoe","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012713015807300_bty130-B28","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: a comprehensive database of protein domain families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"Proteins"},{"key":"2023012713015807300_bty130-B29","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"Uniprot: a hub for protein information","volume":"43","author":"The UniProt Consortium","year":"2015","journal-title":"Nucl Acids Res"},{"key":"2023012713015807300_bty130-B30","doi-asserted-by":"crossref","first-page":"3645","DOI":"10.1093\/bioinformatics\/btw532","article-title":"Extensive complementarity between gene function prediction methods","volume":"32","author":"Vidulin","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B31","doi-asserted-by":"crossref","first-page":"1198","DOI":"10.1101\/gr.9.12.1198","article-title":"Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes","volume":"9","author":"Walker","year":"1999","journal-title":"Genome Res"},{"key":"2023012713015807300_bty130-B32","doi-asserted-by":"crossref","first-page":"i18","DOI":"10.1093\/bioinformatics\/btw244","article-title":"Druge-rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank","volume":"32","author":"Yuan","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713015807300_bty130-B33","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1109\/TKDE.2013.39","article-title":"A review on multi-label learning algorithms","volume":"26","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Knowl Data Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/14\/2465\/48918139\/bioinformatics_34_14_2465.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/14\/2465\/48918139\/bioinformatics_34_14_2465.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T08:49:28Z","timestamp":1693558168000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/14\/2465\/4924212"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,3,7]]},"references-count":33,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2018,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty130","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/145763","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,15]]},"published":{"date-parts":[[2018,3,7]]}}}