9. Affiliation processing¶
9.1. Affiliation Name¶
The terms organisation and affiliation are used throughout the TIM documentation interchangeably.
The original affiliation fields included in each document are two: emm_affiliation__name
and emm_affiliation__nameVariant
.
These are dependent on the source providing them, but special care has been taken to make them as consistent as possible.
- Scopus
emm_affiliation__nameVariant
is the full affiliation name lineemm_affiliation__name
is the main subfield of theemm_affiliation__nameVariant
(department name etc are disregarded).
- Other datasources
emm_affiliation__nameVariant
is one or more name variants of the organisation.emm_affiliation__name
is the main affiliation field provided.
Example
Source |
Field |
Value |
---|---|---|
Scopus |
|
Dept. of Small Anim. Clin. Sciences,College of Veterinary Medicine,University of Florida |
|
University of Florida |
|
Patstat |
|
ITM POWER RESEARCH LTD;ITM POWER (RESEARCH);ITM POWER (RESEARCH) LIMITED |
|
ITM POWER RESEARCH LTD |
Organisations names are fields that need to be processed further in TIM. The main reason for this is that they need to be harmonized across several databases: Scopus, Patstat, Cordis and all the other databases inserted into TIM all have affiliation records in their own format and style. Another important reason is the fact that, even in the original data, there are often duplicates and mistakes. Also, a common name needs to be decided sometimes, irrespective of the locality of the affiliation, or its place in a group hierarchy (is it a daughter company? is it bought by another? is it an umbrella organisation?).
Therefore, disambiguation algorithms are applied to the data in order to achieve a consistent denomination of the affiliations.
The TIM module that is responsible for this is called the Entity Matcher.
The Entity Matcher matches all the incoming affiliation names against its own internal database of affiliation variants, and if a match is found, it provides a new field called emm_affiliation__ename
.
On top of that, a series of location-related fields is delivered, which accompany the respective original fields existing in each document.
In general, the fields provided by the Entity Matcher have the letter “e” as prefix before each attribute, so a field ending in _name
becomes _ename
, _city
becomes _ecity
and so on.
This is illustrated in Fig. 9.1.

Fig. 9.1 The Entity Matcher mainly provides an extra field called emm_affiliation__ename
to each doc, containing a disambiguated value for the affiliation name.¶
9.2. Affiliation Location¶
The location information is tightly linked to the affiliation name information. In the Entity Matcher internal database, each affiliation ename is linked to name variants kept from the various data sources, and each name variant is in turn linked to location information. The location that is most frequent among those variants is the one that the affiliation ename is going to be associated with. This is illustrated in Fig. 9.2.

Fig. 9.2 The Entity Matcher database contains variations on each affiliation, and location information for each of the variations. The most frequent location characterizes the affiliation.¶
Example
In Fig. 9.2, the most frequently appearing variant for JRC in the Entity Matcher internal database is located in Ispra, Italy.
The location information tied to JRC then will always be “Ispra, Italy”.
Some of the location fields tied to this emm_affiliation__ename
will be:
emm_affiliation__ecity
: Ispraemm_affiliation__ecountry
: ItalyThe original location fields will still be retained in each document. These might be, e.g.
emm_affiliation__city
: Sevillaemm_affiliation__country
: SpainIt must be stressed that all the Entity Matcher-attributed fields depend on the successful matching of the organisation.
9.2.1. European Locations¶
TIM generates specific fields for the study of the organisations located in Europe.
These fields respond to the need to visualise either all the EU countries together or only the EU countries.
For those cases, the fields emm_affiliation__eucountry
and emm_affiliation__eocountry
are created, based on the country information, and so are the respective Entity Matcher fields, using the country information of the Entity Matcher.
emm_affiliation__eucountry
is generated by replacing the name of the countries that are members of the European Union by the value EU.
All the other values corresponding to non-EU countries stay unchanged as in _country
.
This makes it possible to build analyses, in which all the european organisations appear as one unit and the rest of the countries appear as separate entities.
emm_affiliation__eocountry
, on the contrary, is used to analyse exclusively the affiliations of EU countries.
In this case, the country name is kept when the country is an EU member, whereas the country name is removed (i.e. replaced by an underscore, “_”) when the country does not belong to the European Union.
Examples
emm_affiliation__country
: Germanyemm_affiliation__eucountry
: EUemm_affiliation__eocountry
: Germanyemm_affiliation__country
: United Statesemm_affiliation__eucountry
: United Statesemm_affiliation__eocountry
:_9.3. Merged Values¶
There are cases where the emm_affiliation__ename
will be empty (e.g. because the algorithm couldn’t match the emm_affiliation__name
with a known organisation), and thus all the (location-related) associated fields will also be empty.
For those cases, a special set of fields starting with emm_affiliation__mrg*
exists.
These fields are, in general, identical to their emm_affiliation__e*
counterparts, except that, where there is no match from the Entity Matcher, the original name and location fields from the document will be used.
This is a compromise, because it will not save you from duplicates or mistakes; on the other hand it will still provide data when the Entity Matcher fails to.
So, field emm_affiliation__ename
becomes emm_affiliation__mrgename
, emm_affiliation__ecity
becomes emm_affiliation__mrgecity
and so on.
The merged fields are all produced by a transformation, as illustrated in Fig. 9.3. You can read more about the transformation mechanism and how you can use it in Transformations.

Fig. 9.3 The field emm_affiliation__mrgename
provides emm_affiliation__name
when emm_affiliation__ename
is empty.¶