14. Fields in TIM¶
Contents
In TIM, documents are stored as a combination of fields and values. A field can be, for example, the title of the document or the year. Some of the fields come directly from the original data (also called raw) and some are the result of further data processing. The field-value pairs are stored in a search engine, which is what we interrogate when creating a query.
14.1. Field Index¶


14.1.1. Full list of common fields¶
The fields in this section are common to all types of documents and not specific to any source.
For the meaning of column Type
, please see section Terms.
14.1.1.1. Fields relating to the document¶
Field |
Description |
Searchable |
Type |
---|---|---|---|
|
Doc ID |
string |
|
|
Class |
string |
|
|
Source |
string |
|
|
Title |
text |
|
|
Link |
string |
|
|
Abstract |
text |
|
|
Year |
string |
Examples
guid:S_2-s2.0-0000198629
class:article
emm_year:2012
. For year range searches use: emm_year:[2010 TO 2012]
source:scopus
14.1.1.3. Fields relating to affiliations¶
For an explanation on the processing done to affiliations, as well as the meaning of some field names, see Affiliation processing.
Also, please keep in mind that these fields are available as long as the respective data is available in the specific group of documents you are searching in. Affiliation information, for example, doesn’t exist for Semantic Scholar data, whereas it does exist for Patstat and Cordis (which are in the same group of documents).
Field |
Description |
Searchable |
Type |
---|---|---|---|
|
Organisation address |
string |
|
|
Organisation reference (raw)
|
string |
|
|
City (raw) |
string |
|
|
Country (raw) |
string |
|
|
Country Code |
string |
|
|
Organisation ID from the Entity Matcher |
string |
|
|
City (processed) |
string |
|
|
Country (processed) |
string |
|
|
Country Code (processed) |
string |
|
|
EU countries only (processed) |
string |
|
|
EU vs World countries (processed) |
string |
|
|
EU vs World countries Code (processed) |
string |
|
|
Organisation (processed) |
text |
|
|
NUTS3 region |
string |
|
|
NUTS2 region |
string |
|
|
NUTS2 region (code with description) |
string |
|
|
NUTS3 region (code with description) |
string |
|
|
EU vs World countries (raw) |
string |
|
|
EU vs World countries code (raw) |
string |
|
|
City (merged) |
string |
|
|
Country (merged) |
string |
|
|
Country Code (merged) |
string |
|
|
EU countries only (merged) |
string |
|
|
EU vs World countries (merged) |
string |
|
emm_affiliation__mrgeeucountryCode` |
EU vs World countries code (merged) |
string |
|
|
Organisation (merged) |
text |
|
|
NUTS3 code (merged) |
string |
|
|
NUTS2 with description (merged) |
string |
|
|
NUTS3 with description (merged) |
string |
|
|
Organisation (raw) |
text |
|
|
Organisation name variant |
text |
|
|
Organisation type (processed) |
string |
Examples
Some of the affiliation-related fields for two universities, one in Germany and one in the US.
Hamburg University (Germany) |
Portland State University (US) |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14.1.3. Fields Specific to Semantic Scholar¶
When working with Semantic Scholar data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the journals, citations and some specific categories only applicable to some specific publications.
As a reminder, the documents being referred to here can be identified in TIM by including source:SemanticScholar
in your search.
Field Name |
Description |
Searchable |
Type |
---|---|---|---|
|
Semantic Scholar author ID |
string |
|
|
DOI |
string |
|
|
DOI URL |
string |
|
|
Document ID(s) of citing documents |
string |
|
|
Journal Title |
||
|
Journal Pages |
||
|
Journal Volume |
||
|
Document ID(s) of cited documents |
string |
|
|
PDF Link |
string |
|
|
Pubmed ID |
string |
|
|
Source |
string |
|
|
Source path |
string |
|
|
Conference Venue |
string |
Examples
emm_journalName:"The Journal of infectious diseases"
emm_doiUrl:"https://doi.org/10.1093/infdis%2Fjiv078"
emm_journalPages:"694-701"
emm_journalVolume:"212 5"
emm_source:Medline
14.1.4. Fields Specific to CORD-19¶
When working with CORD-19 data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to some IDs only applicable to some specific publications and to the original source of the documents.
Field Name |
Description |
Searchable |
Type |
---|---|---|---|
|
More detailed information on the affiliation, if it is referring to a laboratory. |
string |
|
|
Generic regional information for the affiliation, could be province, city, state etc. |
string |
|
|
ZIP code of the affiliation. |
string |
|
|
Unique ID for CORD-19 dataset |
string |
|
|
Fulltext licensing |
string |
|
|
Microsoft Academic (MAG) entity ID supplied by CORD-19 |
string |
|
|
PubMed Central reference number |
string |
|
|
PubMed reference number |
string |
|
|
Specific source of the document, i.e. whether it’s coming from PMC, bioRxiv, medRxiv, WHO, CZI, Elsevier |
string |
|
|
Unique document ID, associated with the dataset provided by WHO |
string |
Examples
emm_source_x:who
(papers provided/curated by WHO)
emm_affiliation__laboratory:UMR AND emm_year:2020
(this year’s papers where CNRS is involved)
14.1.5. Fields Specific to Cordis¶
Included here are fields for searching in Cordis (see below).
In order to make a search specifically for Cordis projects only, class:euproject
should be included in the query.
Field Name |
Description |
Searchable |
Type |
---|---|---|---|
|
Project acronym |
string |
|
|
Total cost of the project (in EUR). |
string |
|
|
Role of the organisations involved in the project. |
string |
|
|
Funding programme name |
string |
|
|
Project ID |
string |
|
|
European Science Vocabulary |
string |
14.1.5.1. Fields Specific to Cordis¶
When working with Cordis data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the identification of the specific EU research project.
As a reminder, the documents being referred to here can be identified in TIM by including source:cordis
in the search.
Field Name |
Description |
Searchable |
Type |
---|---|---|---|
|
Funding programme name |
string |
|
|
Project acronym |
string |
|
|
Project ID |
string |
|
|
Call for proposal of the EU programme. |
string |
|
|
Topic of the call. They are targeted towards specific topics in a broader scientific field. |
string |
|
|
Funding Scheme of the Call. The scheme will determine: the scope of what is funded, |
string |
|
|
Subject Index Classification code of the Project (seems to have been See the full list of subjects in FP7. |
string |
|
|
Starting year |
string |
|
|
Country of the coordinator of the project. |
string |
|
|
EU contribution to the project (in EUR). |
string |
|
|
EU contribution to the project (in EUR). In numerical format |
string |
|
|
Total cost of the project (in EUR). |
string |
|
|
Summary of the Project report |
string |
|
|
Work performed during the project |
string |
|
|
Final Results of the Project |
string |
|
|
DOI of related publication |
string |
Examples
emm_programme:h2020
Retrieves all EU research projects under Horizon 2020 (H2020).
emm_acronym:NANOPAD
emm_projectid:33017
emm_call:H2020-MSCA-IF-2014
emm_eutopic:MSCA-IF-2014-EF
emm_euscheme:MSCA-IF-EF-ST
emm_eusubject:(LIF OR MED OR SCI)
emm_countrycoordinator:UK
emm_eugrant:344050
emm_eugrantn:[300000 TO 400000]
emm_totalCost:344050
Note
Concerning the field emm_call
, H2020 includes the following main types of action:
14.1.5.2. Fields relating to Cordis affiliations¶
The affiliation information related to Cordis is very similar to the information available for other types of documents in TIM (publications, patents etc). For the organisation name, country, and so on, please refer to the same fields already detailed in the section Fields relating to affiliations.
However, some extra fields are used to give more information on the organisations participating to the EU research programmes.
Field Name |
Description |
Searchable field |
Possible Values |
---|---|---|---|
|
Role of the organisations involved in the project. |
coordinator |
|
|
Participant Identification Code (PIC) |
||
|
Participant organisation type. |
HES (Higher or Secondary Education) |
|
|
EU contribution to the specific participant. |
Examples
emm_affiliation__role:coordinator
emm_affiliation__pic:997153502
emm_affiliation__entityType:PRC
emm_affiliation__eugrant:170121,6
14.1.6. Fields Specific to Patstat¶
Patstat contains bibliographical data relating to more than 100 million patent documents from leading industrialised and developing countries.
Each document in TIM’s search engine is in reality a patent family. When working with Pastat data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the identification of patents.
Both the application and publication numbers are available, for all members of the patent family.
As a reminder, the documents being referred to here can be identified in TIM by including source:pastat
in the search (and they all belong to class:patent
, so you can use this in your search instead).
Field Name |
Description |
Searchable |
Type |
---|---|---|---|
|
ID of the patent record in the Patstat database. |
string |
|
|
Patent application numbers of all patent family members. |
string |
|
|
Application Number of the most recent patent |
string |
|
|
Date in YYYY-MM-DD format of the latest patent application |
string |
|
|
Patent publications numbers of all patent family members |
string |
|
|
Publication Number of the most recent patent |
string |
|
|
Date in YYYY-MM-DD format of the latest patent publication |
string |
|
|
Priority date of the patent (YYYY-MM-DD format) |
string |
|
|
Year of the priority date of the patent |
string |
|
|
Patent Kind code |
string |
|
|
Earliest patent office |
string |
|
|
Number of patent family members. |
float |
|
|
DOCDB family ID |
N/A |
|
|
INPADOC family ID |
string |
|
|
Indicates if the patent has been granted. |
string |
|
|
NACE Rev.2 code assigned to the patent application |
string |
|
|
NACE Rev.2 weight |
string |
|
|
CPC patent classification |
string |
|
|
IPC patent classification |
string |
|
|
Address of the author (inventor) |
string |
|
|
Author ID (from Patstat) |
string |
|
|
Country of the author |
string |
|
|
Country Code of the author |
string |
|
|
Harmonized Applicant Name (HAN) from OECD |
text |
|
|
A few variants available in the patstat database, joined |
text |
|
|
Indicates the degree of harmonization and standardization which could be achieved |
string |
|
|
Type of organisation |
string |
|
|
Number of Applicants |
float |
14.2. Field Types¶
Every field has a specific type. Depending on this type, the value is both stored in the TIM database and queried in a different way.
The types that are of interest are:
14.2.1. Text¶
If the field is of type text
, the value of the field is considered as normal text, and is thus split into words, each word is stemmed and then stored.
When a field of type text
is queried, terms need to be combined, e.g.
title:(rapid AND prototyping)
or else they have to be queried as an exact phrase:
title:"rapid prototyping"
14.2.2. String¶
If the field is of type string
, the value of the field is stored verbatim, no splitting into words is performed, and no stemming. This type is better for fields that hold exact values, such as the global identifier of the document, the doi (document object identifier), a date, a CPC classification.
Because there is no word-splitting, a value like G09G 3/2092
is stored with its spaces, as a continuous string
.
This has some implications on how these fields should be queried.
emm_classificationCPC:G09G 3/2092
will not work.
emm_classificationCPC:"G09G 3/2092"
will work, but what if you need to search for all classification codes that end in /20XX ? The asterisk modifier does not work with exact phrases (i.e. terms in quotes).
In this case, the spaces must be escaped, that is, they need to be preceded by a backslash character \
, like so:
emm_classificationCPC:G09G\ \ 3/20*
Note
Do not use escaped spaces for fields of type text
, this might have unintended consequences. Besides, spaces are not stored anywhere; separate words are.
14.2.3. Float (decimal value)¶
If the field is of type float
, the value of the field is a decimal number. This type is used for storing numbers that may need to be queried in a range. For example, you might need to find documents that have between 1 and 10 authors.
The field emm_numAuthors
is of type float
, hence the query should be:
emm_numAuthors
:[1 TO 10]