IMGT®, the international ImMunoGeneTics information system® (CNRS and Montpellier University) is the global reference in immunogenetics and immunoinformatics. the IMGT/mAb-DB interface for therapeutic antibodies and fusion proteins for immunological applications (FPIA). and 868 genes and 1318 alleles for in October 2014). An interface, IMGT/mAb-DB [12], has been developed to provide an easy access to therapeutic antibody amino acid sequences (links to IMGT/2Dstructure-DB) and structures (links to IMGT/3Dstructure-DB, if 3D structures are available). IMGT/mAb-DB data include monoclonal antibodies (mAb, INN suffix –mab) (a –mab is defined by the presence of at least an IG variable domain) and fusion proteins for immune applications (FPIA, INN suffix –cept) (a –cept is defined by a receptor fused to an Fc) from the WHO-INN programme [48,49]. This database also includes a few composite proteins for clinical applications (CPCA) (e.g., protein or peptide fused to an Fc for only increasing their half-life, identified by the INN prefix ef–) and some RPI used, unmodified, for clinical applications. The unified IMGT® approach is of major interest for bridging knowledge from IG repertoire in normal and pathological situations [70,71,72,73,74,75], IG allotypes and immunogenicity [76,77,78], NGS repertoire [23,24], antibody engineering and humanization [33,40,41,42,79,80,81,82,83,84,85,86]. 2. Fundamental Information from IMGT-ONTOLOGY Concepts 2.1. IDENTIFICATION: IMGT® Standardized Keywords More than 325 IMGT® standardized keywords (189 for sequences and 137 for 3D structures) were precisely defined [57]. They represent the controlled vocabulary assigned during the annotation process and allow standardized search criteria for querying the IMGT® databases and for the extraction of sequences and 3D structures. They have been entered in BioPortal [87] at the National Center for Biomedical Ontology (NCBO) in 2010 2010. Standardized keywords are assigned at each step of the molecular synthesis of an IG. Those assigned to a nucleotide sequence are found in the DE (definition) and KW (keyword) lines of the IMGT/LIGM-DB files [7]. They characterize, for instance, the gene type, the configuration type and the functionality type [57]. There are six gene types: variable (V), diversity (D), joining (J), constant (C), conventional-with-leader, and conventional-without-leader. Four of them (V, D, J, and C) identify the IG and TR genes and are specific to immunogenetics. There are four configuration types: germline (for the V, D, and J genes before DNA rearrangement), rearranged (for the V, D, and J genes after DNA rearrangement), partially-rearranged (for D gene after only one DNA rearrangement) and undefined (for the C gene and for the conventional genes, which do not rearrange). The functionality type depends on the gene configuration. The functionality type of genes in germline or undefined configuration is functional (F), ORF (for open reading frame), or pseudogene (P). The functionality type of genes in rearranged or partially-rearranged configuration is either productive (no stop codon in the V-(D)-J region and in-frame junction) or unproductive (stop codon(s) in the V-(D)-J region, and/or out-of-frame junction). The 20 usual amino acids (AA) have been classified in 11 IMGT physicochemical classes (IMGT® [1], IMGT Education Aide-mémoire Amino acids). The amino acid changes are described according to the hydropathy (three classes), volume (five classes) and IMGT physicochemical classes (11 classes) [29]. For example Q1 E (+ + −) means that in the amino acid change (Q E), the two amino acids at codon 1 belong to the same hydropathy (+) and volume (+) classes but to different IMGT physicochemical properties (−) classes [29]. Four types of AA changes are identified in IMGT®: very similar (+ + +), similar (+ + −, + − +), dissimilar (− − +, − + −, + − −), and very dissimilar (− − −).

