1.4 Working Methods
elsewhere on this site
, a Dutch lexical corpus (DLC) was made available to us by our principal the CLVV commission, as well as the
database editor OMBI. When the actual execution of the project started, the team of editors was instructed on how to use the editor and the structure of the Dutch lexical corpus.
And then te work started: translating the Dutch words and expressions into Arabic.
As mentioned elsewhere, the size of the complete DLC was too large for our project, so we had to make a selection from the DLC (RBN). The total DLC contains about 46.000 entries, which number has been reduced in our database to 37.000 Dutch entries in the Dutch-Arabic part. This process of selection was carried out by two persons, since we realized a decision concerning the maintenance or deletion of a word or expression should always be taken by two persons. This process of selecting from the DLC has been estimated to have cost us 438 hours.
The total size of the DLC is:
entries (Form Units) (6.283 verbs, 29.495 nouns, 6.344 adjectives, 128
interjections, 927 adverbs, 426 other function words, 3000 geographical names).
As for the units that were deleted during the selection process, in general the following remarks can be made:
During the selection process we noticed that part of the
DLC needed some editing as well. So during the process of deleting we also entered changes, consisting of shortening the sentences of expressions, or, in some cases, making expressions more explicit in order to facilitate the translation into Arabic.
And even in the final stage of proof reading it was decided some editorial changes needed to be made to the Dutch definitions of the meanings of polysemic words.
The translation of the DLC into Arabic.
As mentioned elsewhere, most of the existing Dutch-Arabic dictionaries do not contain that many examples. If we had only had to translate these 37.000 headwords, without the expressions, this would have saved us
disproportionately much time, since it was mainly the translation of the examples that demanded the use of time-consuming tools an discussions. The use of a concordancy program, the use of various other dictionaries, all these activities can demand many minutes for just one translation.
elsewhere a significantly higher percentage of expressions than words (lexical units) had to be translated with explanatory descriptions.
And where the addition of a bi-directional translation was at the same time an investment in the creation of the Arabic lexical corpus (ALC) and the Arabic-Dutch part, the creation of such a huge number of unidirectional descriptions did not have any effect on the development of the Arabic-Dutch part.
The tools we have used in translating the Dutch words and expressions, of which some already have been mentioned were:
I will discuss these tools and show some examples.
1 Lexical knowlegde of the editors
However, despite the excellent lexical knowlegde of the editors, many words and expressions could not simply be translated without consultation of reference works or other sources of information.
Other factors contributing to the difficulties in translating the Dutch words and expressions are:
So, although we were convinced to have gathered a team of excellent editors, it appeared that
in many cases these persons had to consult reference works and tools in order to be certain about proposed Arabic translations.
2 Consultation of and discussions with other editors
3 Other dictionaries
We have been using a great number of dictionaries during all stages of the compilation of our dictionaries. These dictionaries can be classified in different categories:
overview of the dictionaries that were used
can be seen through the following link.
One reference dictionary should be mentioned here, since we have been using it very intensively during all stages of the project. This is the
Larousse/ALECSO Basic Dictionary (al-mu'jam al-'asasi المعجم الأساسي). This monolingual Arabic dictionary, compiled especially for non arabophone learners of Arabic has proven to be a reliable and practical reference work. It will not be very difficult to trace some borrowing from this dictionary in our dictionaries, especially in the field of expressions. I was encouraged by Dr. Ali Al-Kasimi, ALECSO coordinator for the Basic Dictionary Project, to borrow from his work, and so we did. So, after finishing our project I want to thank Dr. Al-Kasimi for this permission, as well as for the answers he gave to a number of questions he allowed me to ask him for some 'difficult' words to be translated into Arabic.
Another conclusion after using so many different dictionaries of Arabic concerns the so called
In the list above, a considerable number of these dictionaries is mentioned. In
particular the Unified Dictionaries of the Bureau of Coordination of Arabization
of ALECSO in Rabat cover 75% of all specialized dictionaries.
In many cases these specialized dictionaries do not agree with each other on the translation of specialized terminology from different fields. Furthermore, the terminology presented in these specialized dictionaries can very often not be verified in actual usage, since even in a large size text corpus the terminology presented in these dictionaries can not be found.
Additional research on this topic could lead to interesting conclusions.
The other already existing Dutch-Arabic dictionaries were almost not used by the editors of the present project. The Van Mol Learner's Dictionaries did not come out until 2001, when we already had finished the stage of translating the DLC into Arabic. As for the two other already existing dictionaries (Amiens and Al-Manhal), we considered them insufficiently reliable to use, so despite their presence we have not consulted these dictionaries very often.
Dutch-Arabic database with rough materials
period of some years preceding the actual execution of the project I started to
collect and join a number of vocabulary lists, for example from various teaching
materials I was preparing for my students. These materials consisted of the
complete vocabulary list of the Arabic Textbook for beginners, written by Krahl
and Reuschl, but also more specialized materials like word lists with newspaper
articles or radio broadcasts, and texts in a course on 'Business Arabic'. All
these lists were joined together in an Access database which, when we started
the execution of the project, consisted of 55000
This table contained a lot of usefull 'rough material' but it also contained a lot of redundant information, since basic vocabulary could be entered from different lists.
This Access database did not only consist of a table with Dutch words and their translations, it also contained a table with collocations and one with idiomatic expressions. Not only had I been collecting word lists in the pre-execution stage, I had also been
collecting collocations and idiomatic expressions
during all the reading of and listening to Arabic texts. These were texts I had been using for my courses, as well as texts that were presented to me for translation into Dutch. The content of these tables proved useful in the stage of completing the Arabic lexical corpus (ALC),
which will be discussed later.
5 Text corpus with concordancy program
When we started the execution in March 1997 we had at our disposal a text corpus of 3 million words. More texts were available, since the complete Al Hayat volume contains more than 3 million words, but in order to meet with the requirement of a balanced corpus, I decided not to include all the Hayat articles in the corpus.
list of all the scanned novels and nonfiction texts
is available, and can be viewed
During the stage of translating the DLC it was felt we lacked information on sports terminology, but through the various newspaper websites it was relatively easy to collect a number of texts on sports. I did read a number of these pages and stored interesting terminology, collocations and idiomatic expressions in the already described Access database.
A similar exercise has been carried out with files from the Islamic News Agency.
When even Arabic search engines (like Ayna:
www.ayna.com) became available, the whole internet became like a gigantic corpus of Arabic texts. However, the reliability of the internet as a source of linguistic information is of course doubtful, just as the reliability of all the information on the internet is sometimes doubtful.
To illustrate the technological progress that took place during the stage of execution of the project, it is worth mentioning that by the end of the project the ALECSO Unified Dictionaries even became available on-line. They can be searched in three languages (Arabic, English and French) and can be reached through the following link: http://www.arabization.org.ma/Dictionnaire.asp
A text corpus consisting of several millions of words can only be exploited with a concordancy program. For this purpose we have used the program Monoconc, produced by Athelstan (http://www.athel.com/). There is a
separate section about the way we have been using this concordancy program
during the project.
the second stage: completing the Arabic lexical corpus
It is not possible to trace the exact size of the ALC at the beginning of the second stage, but the numbers for the content of the database at that time indicate the following:
So there was an imbalance between the two parts of the dictionary which had to be corrected. Explanations for the existence of this imbalance have been presented elsewhere on this site.
Another explanation for this imbalance lies in the existence of lexical gaps in Arabic in relation to Dutch. This phenomenon is being described in a separate section of this website.
The ALC had to be extended on different levels. A paragraph about this can be read in section 1.2.3.
So, first of all we needed to add Arabic words, since a vocabulary of only 18.429 is not sufficient for a translation dictionary. Secondly the number of examples, expressions and collocations had to be extended, as well as the number of idiomatic expressions.
Different steps were taken in order to correct the just mentioned imbalance.
I will describe these different steps briefly.
Texts from various sources have been read to extract expressions, collocations and lacking Arabic words.
data from these tables were used by the editors of the dictionary in order to
extend the number of Arabic entries, lexical units and expressions.
Examples of these database tables
can be seen through the following link.
3 Other dictionaries were used as reference lists
Another dictionary that has been used as a reference is the already mentioned Larousse/ALECSO Basic Dictionary (المعجم
And finally, another Larousse dictionary was used to check the macro structure of our dictonary, this was the
As-Sabil dictionary Arabic-French by Daniel Reig. Since the macro structure of this dictionary contains 25.000 entries, it was considered a good reference list.
4 A list titled ar-Raseed al-lugawiy was used
5 Frequency lists were used
6 A frequency list was made with the concordancy program and the corpus of texts
Although this is a very rough method, it has contributed to the completion of the ALC.
7 A list of roots from the dictionary of Hans Wehr was used as reference
However, this list of roots from Hans Wehr also appeared to contain a rather large number of roots that could not be found in Hans Wehr, neither in the
1980 Arabic-English edition, nor in the 1985 Arabic-German edition.
8 Use of the memory of native speakers in order to extend the number of expressions and collocations
The following table shows the numbers of units in both languages after the first stage (same numbers as in the table at the beginning of this paragraph), and after the second stage, including a number indicating the increase in terms of percentage.
From this table it becomes clear that the ALC has increased considerably (over
30%), whereas the DLC slightly decreased.