Back Home Next

The Nijmegen Arabic/Dutch Dictionary Project

1.3 Plans about the realization of the project

 

As described on the 'historical page' two voluminous proposals have been drawn up before the actual execution of the project started.

However, since planning to compile a set of dictionaries is very difficult, some of the plans that were initially made and mentioned in the second and final proposal have not been realized. These differences between the plans and the realization will be treated briefly in this section. Remarks considering these difference will be presented in a different font.

 

the first proposal, 1992
This proposal was the outcome of a preparatory study that was carried out with support of the Dutch Ministry of Education.
This proposal was translated in English and is available through the link below.

The main decisions that were presented in this report can be summarized as follows:

bullet

Both volumes (Dutch-Arabic and Arabic-Dutch) will be bi-directional, i.e. they will be used by speakers of Dutch and of Arabic, so both volumes will have two functions: a dictionary for production and for understanding

bullet

The languages involved will be Standard Dutch and Modern Standard Arabic

bullet

Both dictionaries will contain a basic lexicon of 30-35,000 entries.

bullet

An existing Dutch corpus will be used for the Dutch-Arabic dictionary.

bullet

For the Arabic-Dutch dictionary a suitable corpus was being looked for.

Through the following link the report and a summary can be read, annotated with a number of comments concerning the differences between the original plans and the actual execution of the project.
First Report on a Preliminary Study to compile a Dutch-Arabic/Arabic Dutch dictionary

 

the establishment of a Commission for Lexicographic Translation Facilities, 1993
The presentation of the first proposal coincided with the establishment of the CLVV (Commissie Lexicografische Vertaalvoorzieningen - Commission for Lexicographic Translation Facilities).

In order to obtain financial support from this commission, a new proposal had to be drawn up and submitted.
Since the commission had drawn up detailed instructions on the submittal of proposals, the old proposal had to be updated.
These instructions and terms of reference  presented a number of criteria for the applicants to comply with:
The CLVV informed all applicants in an information leaflet about the conditions:
The Dutch and Flemish government act, through the CLVV, as co-financer for multifunctional re-usable electronic lexical databases, that will be produced with modern technological facilities and facilities concerning the infrastructure.

The multifuncionality of the databases means:

bullet

the database has to be structured in such a way that it will be possible to derive from it a dictionary or more than one dictionary (stratification)

bullet

the database must be structured in such a way that, as much as possible through automatic reversion, it  will be the point of departure for the production of the reverse part of the dictionary pair.

bullet

the database must be suitable to obtain from it printed products and electronic products.

bullet

the database must be structured in such a way that it will be possible to derive from it dictionaries that will be bidirectional, i.e. they can be used by speakers of both languages.

Another condition is the fact that the dictionaries have to be based on empirically obtained data, i.e. from modern text corpora.
In the same information leaflet the CLVV also mentioned a number of specific criteria for the Arabic-Dutch and  Dutch-Arabic learners' dictionaries.

In this publication the CLVV referred to two different sets of learners' dictionairies: one set for beginners (final years of elementary education and first years of secundary education) containing about 10.000 entries, and one set for advanced learners, containing about 30.000 entries.

The database must contain information to make stratification possible, the database must be rich with expressions in pregnant context, in both the macro and micro structure much explicit attention has to be paid to those matters that are important or difficult to language learners. The micro structure has to be optimally accessible. The meta language has to be Dutch, and has to be geared to the proficiency level of the users. The micro structure supplies information concerning the meanings of a word, as well as grammatical information, phonetic information and information about style level and usage. The dictionaries have to contain usage notes and culture notes, and, if necessary, visual information in appendixes.

 

the second proposal, 1994
The final proposal was written in Dutch, its text is not available on this website.
However, I will briefly mention the differences between the second proposal (presented to the CLVV) and the first proposal (that was presented to the Ministery of Education).
In italic print I have added comments to these remarks.

-the first proposal mentions 30-35.000 entries, the second proposal mentions 30.000 Dutch entries to be selected from the Van Dale lexical databases. The fact that the Van Dale dictionary publisher is mentioned explicitly in the proposal reflects the contacts that existed between the Nijmegen team and the publisher. Under certain conditions this publisher was interested in adding a set of Arabic-Dutch dictionaries to its list.

However, through the availability of the RBN Dutch Lexical Corpus the willingness of Van Dale to cooperate with our team was no longer needed, all the more since the database editor OMBI  was also supplied by the CLVV. Both the availability of a Dutch corpus and an editor were important reasons for us to consider cooperation with Van Dale, in addition to the fact that we could benefit from the reputation of Van Dale in the Netherlands.

- in compliance with the conditions mentioned by the CLVV we also planned to produce the derivative of the beginners' learners' dictionaries in our proposal.

But in the end the compilations of these beginners' learners' dictionaries was entrusted to our Belgian colleague Mark van Mol.

- both volumes of the dictionary will contain a basic lexicon of 30.000 entries, which will cover about 600 pages for each volume.

The number of entries for both volumes finally turned out to be lower (>24.000 entries) for the Arabic-Dutch part and higher (>37.000) for the Dutch-Arabic part. This discrepancy is explained on the page about the working methods.

The total number of pages of both volumes appeared to surmount the estimated number with almost 100% since the total number of pages of both volumes is 2300  pages.

- in the final proposal it was still planned to use the two-digit code as used in the Van Dale bilingual dictionaries.

However, the fact that the cooperation with Van Dale was discontinued and the availability of the OMBI editor made other options available.

- a phonological description of Arabic headwords in the Arabic-Dutch volume was also planned.

However, for practical reasons it was decided to abandon this intention. This decision was  justified by the fact that Arabic has very clear relations between the graphic representation of the phonemes and their pronunciation. Furthermore,  grammatical knowledge of Arabic is needed in order to be able to consult the Arabic-Dutch part which is ordered according to roots, and, if grammatical knowledge is needed, elementary knowledge of the pronunciation rules may certainly be expected.

A final, again practical, reason was the fear that adding another type of script (IPA or other transcription font) in addition to the Arabic and Latin script, might cause additional problems during the editorial or production stages.

- for the pronunciation of the Dutch entries information will be presented about word accent, since this is not fully predictable in Dutch

This intention was not realized since the Dutch lexical corpus (RBN) did not contain information concerning the pronunciation or word accent. It would have been to much work for the project team to add this information for 37.000 Dutch entries.

- we will strive to add translation equivalents as much as possible, and only in case of insurmountable culture differences we will present (typographically marked) descriptions.

This intention has always been strived for. However, as described in another section of this website the number of descriptions in the Arabic-Dutch part was unexpectedly high.

- pragmatics: since low style levels do not exist in MSA, it was our intention not to include low style levels in Dutch too.

However, since the Dutch lexical corpus (RBN) did contain a certain amount of informal and even vulgar entries and expressions, during the execution stage this intention was adjusted, and some low style level units have been included, although the total number of units is rather limited. The style label 'informal' occurs 1783 times in the Dutch-Arabic part (761 lexical units and 1022 example units), and the label 'vulgar' occurs 129 times [75 lexical units and 54 example units).

- usage notes and culture notes will be included.

This plan has been realized on a limited scale only. Usage notes are mostly included in the shape of pragmatic labels like geographic limitations concerning the usability of a word or expression, and in very limited shape concerning the social group usability, since some expressions have been marked as being used among Christian speakers of Arabic only.

Culture notes mostly consist of additions in descriptions concerning typical Dutch words or expressions, like 'Sinterklaas' = St. Nicholas and the traditions related to this typical Dutch folklorist tradition, or concerning typical Arab or islamic words and expressions in the Arabic-Dutch part. It is our plan however, to expand such usage and culture notes in a possible future electronic edition of these dictionaries.

- innovating characteristics
Already in the planning stage it was obvious, on the basis of the raw materials that were available then, that our dictionaries would be innovating on the aspect of the inclusion of many collocations.

Since there is a special section of this website devoted to this topic, I will not go into further details about this topic here.

- technological aspects: in the original plan, while we were still expecting to cooperate with the Van Dale publishing house, we were planning to benefit from  an editing program for the database that would be supplied by Van Dale.

But since the cooperation with Van Dale did not become reality, and since the CLVV supplied us with the OMBI editor, this was not considered a disadvantage. As the Van Dale editor was a UNIX program, and OMBI a Windows application, it was felt by me as the project leader as a relief to be able to work in a Windows (Arabic enabled) environment, because of lack of experience with UNIX applications and operating system in combination with Arabic.

The second proposal also contained a description of the final product, and after a special request from the CLVV we also presented a number of  proof lemma's.
These proof lemma's are no longer available as a file, but a scan of the pages containing these lemma's is available through this link.

reactions to: j.hoogland@let.kun.nl
last updated 26/10/2003 15:16 +0100
Back Home Next