Home
Search
more | tips

 
 

NEWSLETTER #13

mid-April 1997


To unsubscribe from this newsletter, send a blank message to newsletter-unsubscribe@imdb.com - *not* newsletter@imdb.com. To subscribe, fill out the survey form on the web site and check the appropriate box.

Welcome to issue 13 of the IMDb newsletter. The newsletter is intended to keep database users and contributors informed of the latest developments from the management team. Comments and suggestions are welcome and should be directed to newsletter@imdb.com. Issue 14 is scheduled for June.

See the further information section at the end of this file for more information about The Internet Movie Database (IMDb).

this issue edited by Jon Reeves


Contents


WE ARE NOT SPAMMERS

by Jon Reeves

Some of you may have gotten some mail recently that offered deals on magazines, as well as extolling the virtues of our site. While we appreciate the kind words of the message's author, we were totally unaware of this message until people started forwarding it to us, and we deplore the tactics of its sender. Rest assured that we are not advocates of junk e-mail (in fact, I spend quite a bit of time each week dealing with it) and would never use it; if you're getting this newsletter, you asked to get it. We've also taken steps to prevent people from using us as a relay for their junk mail. In addition, unless you enter a contest (in which case we may give it to the sponsor) or write a bio or plot and don't ask to be anonymous, we don't share your e-mail address with anyone.


WE'D LIKE TO THANK THE ACADEMY...

by Jon Reeves

You may have seen an ad for IMDb on the official Academy Awards site. There's an interesting story behind it.

The designers of that site apparently used an ad from one of our old campaigns, served from our machines, when they were testing their site. We noticed and asked them not to do this, and they said they would stop. However, after their site went live, they continued to serve up that ad to people who had JavaScript turned off. We assumed this meant they wanted to help us, and replaced it with an ad promoting the Internet Movie Database, which got an excellent response. The ad was present on Oscar night and most of the next few weeks but as of this writing is gone again.

Once again, our emphasis on information over glitz meant we were able to do real-time updates and our servers were able to handle the load with ease.

So, if you came here from the Oscars site, welcome! We are not affiliated in any way with the Academy of Motion Picture Arts and Sciences.


FILM THREAT

by Col Needham

By special arrangement with the publisher, we're pleased to welcome the Film Threat Weekly to the IMDb. Check the feature of the day every Monday for the latest issue containing news and information with a focus on independent film. Regular features include reviews of the latest movies; film news; picks of the week in several categories; and the US box office top 10.

All names and titles are linked into the IMDb where appropriate to provide background information and we're also maintaining an archive of past issues.


THE GREAT ISO SWAP

by Michel Hafner

INTRODUCTION

After promising it for a long time IMDb has finally replaced its old official character set, the 7 bit ASCII character set, with the new 8 bit ISO-8859-1 character set (aka ISO Latin 1). This new character set belongs to a family of ISO sets that were designed to cover the majority of the important languages of the world.

ISO-8859-1 is optimized for West European languages and can display almost all characters that are used in Albanian, Catalan, Danish, Dutch, English, Finnish, French, German, Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish and Swedish. That's one of the reasons it was chosen. Many of the important film nations are fully covered with this set. In addition it's widely supported by e-mail software, web browsers and operating systems in general.

The main difference between ASCII and ISO Latin 1 is the addition of 96 new characters used in the above mentioned languages among others. These characters are:

    ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
  ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
  À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
  Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
  à á â ã ä å æ ç è é ê ë ì í î ï
  ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

More details about the different ISO sets can be found here.

ISO Latin 1/2/3... and IMDb

ISO Latin 1 is now the official character set of IMDb. This means that

  • all new names and titles and all other new text entered into the database that need these new characters to be spelled correctly must be submitted using ISO Latin 1.
  • all old names and titles and all other old text already in the database that need these new characters to be spelled correctly MUST be converted to use ISO Latin 1.
  • We have used the long preparatory phase to industriously collect the ISO versions of titles and names so we were able to start with a sufficiently large portion of data already converted. But there remain literally thousands of names to be adapted, lots of character names to be swapped, attributes in different lists to be replaced and general text to be updated. We hope you will help us here and mail in corrections as time goes by. The corrections can be mailed in like regular corrections using the mail server and the usual keywords.

    Since not all computer systems/mail software do support ISO Latin 1 we have provided for alternative ways of entering data.

  • All names and titles that are mailed in with ASCII and have a ISO Latin 1 counterpart already in the database are automatically swapped to the ISO Latin 1 version. So you can mail them in with ASCII and cause no problems doing so:
  • Examples:
    You mail in The mail server swaps to
    Bunuel, Luis Buñuel, Luis
    Aberg, Anders Åberg, Anders
    Aaberg, Anders Åberg, Anders
    Beart, Emmanuelle Béart, Emmanuelle
    Bene, Gyoezoe Bene, Gyözö
    Bene, Gyozo Bene, Gyözö
    Bressler, Gunter Breßler, Günter
    Forque, Jesus-Maria Forqué, Jesús-María
    Wer zweimal lugt (1993) Wer zweimal lügt (1993)
    Was fuer ein Genie (1985) Was für ein Genie (1985)
    Voeroes grofnoe, A (1984) Vörös grófnö, A (1984)
    Vi paa Vaeddoe (1958) Vi på Väddö (1958)
    Ultima pelicula, La (1971) Última película, La (1971)

  • If a name has a ISO Latin 1 version but only the ASCII version is correct, since the ASCII version is for one person and the ISO version for another, you have to use Roman numerals to turn off auto swapping:
    Example: Berger, Pamela (I) versus Berger, Paméla (II)
  • The Roman numerals are permanent in this case and are also used throughout the database not just for input purposes.

  • If you are familiar with the way HTML encodes ISO Latin 1 characters you can use this encoding too in your mailings to the mail server. The relevant mappings are:
  • Æ --> Æ Á --> Á Â --> Â À --> À
    Å --> Å Ã --> Ã Ä --> Ä Ç --> Ç
    Ð --> Ð É --> É Ê --> Ê È --> È
    Ë --> Ë Í --> Í Î --> Î Ì --> Ì
    Ï --> Ï Ñ --> Ñ Ó --> Ó Ô --> Ô
    Ò --> Ò Ø --> Ø Õ --> Õ Ö --> Ö
    &Thorn; --> Þ Ú --> Ú Û --> Û Ù --> Ù
    Ü --> Ü Ý --> Ý á --> á â --> â
    æ --> æ à --> à å --> å ã --> ã
    ä --> ä ç --> ç é --> é ê --> ê
    è --> è ð --> ð ë --> ë í --> í
    î --> î ì --> ì ï --> ï ñ --> ñ
    ó --> ó ô --> ô ò --> ò ø --> ø
    õ --> õ ö --> ö ß --> ß þ --> þ
    ú --> ú û --> û ù --> ù ü --> ü
    ý --> ý ÿ --> ÿ

    Example:
    NAME
    Béart, Emmanuelle

  • If your mailer does support the ISO Latin 1 character set make sure that all data you are mailing in is not direct 8 bit ISO Latin 1 but MIME compatible encoded ISO Latin 1 data using the Quoted-Printable encoding that uses only ASCII characters. This is necessary because not all mail systems between your computer and ours that transport your mail can handle raw 8 bit characters. Some simply ignore the special ISO Latin 1 characters and remove them from your additions so names and titles get mutilated. While we often can and will recognize and correct these amputated versions they must be avoided at all costs. So please configure your mailer properly or ask your system administrator, if you can not do it yourself.

Be aware of excessive/missing use of ISO Latin 1 in certain culturally biased sources. For example, French sources might use Marlène Dietrich because Marlene is spelled Marlène if it is a French first name. But since Marlene Dietrich is German and not French and she made her career in Germany and USA for the most part which both spell her first name as Marlene the ISO version is not correct here and has to be avoided. Likewise be aware of English and other sources that often ignore the need for ISO and spell everything using ASCII which is again not correct and has to be avoided. This is very widespread! A generally reliable source from country x spells correctly for data from its own culture and language but fails to do so for data outside this area (and competence). So it's safest to use Spanish sources for Spanish data, French sources for French data, Italian sources for Italian data etc.

While ISO Latin 1 covers most of the languages spoken in important film nations it does not provide all necessary characters for languages such as Czech, Hungarian, Romanian, Estonian, Latvian, Lithuanian, Bulgarian, Macedonian, Russian, Polish, Serbian, Turkish and others. In addition, languages using radically different character sets such as Hindi, Greek, Arabian, Hebrew or pictogram based languages such as Japanese and Chinese are not directly representable. The situation concerning IMDb is as follows for the time being:

  • Czech, Hungarian, Polish, Romanian, Croatian, Slovak, Slovene... that have as native character set ISO Latin 2:
  • Data must be transliterated to ISO Latin 1. The mappings are straightforward. If an accented character is missing in ISO Latin 1 use the non accented version. Examples: Svêrák, Jan --> Sverák, Jan (the ê should have the ^ upside down, a character not in ISO Latin 1) and not Svêrák, Jan
    Kies'lowski, Krzysztof --> Kieslowski, Krzysztof (the s should have a ' on top of it, a character not in ISO Latin 1)

    There is one exception so far: the characters u'' and o'' (the '' should be on top of the u and o) as used in Hungarian are mapped to ü and ö!
    Examples:
    Mihályi, Gyo''zo'' ---> Mihályi, Gyözö
    Szu''cs, Gábor --> Szücs, Gábor

    (If you are knowledgeable about any of these ISO Latin 2 languages and feel strongly that the mappings should be different please let me know so we can discuss it.)

    There is also the possibility to mail in ISO Latin 2 data itself! If you want to mail us the correct ISO Latin 2 version of a name or title now in ISO Latin 1 use the new server keywords
    ISO2NAME and
    ISO2TITLE

    Example:
    ISO2NAME
    Szegö, András|Szegõ, András|
    Kieslowski, Krzysztof|Kie¶lowski, Krzysztof|
    ISO2TITLE
    Aniol ciemnosci (1991)|Anio³ ciemno¶ci (1991)|
    Csillagszemü, A (1977)|Csillagszemû, A (1977)|

    The trick here is to encode everything as ISO Latin 1 but using for the right side the characters that are binary identical to the correct ones for ISO Latin 2! So the left side looks correct and the right side looks funny if you use a ISO Latin 1 font and vice versa if you use a ISO Latin 2 font. The data will not be used directly in the database since both character sets can not be mixed together with current web and mail software. It will be used later when this is possible. (See UNICODE below.) The data collected so far will though be browsable on our WWW servers so you can avoid mailing in data we already have.

  • Galician, Maltese, Turkish and other languages with native character set ISO Latin 3:
  • Data must be transliterated to ISO Latin 1. The mappings are straightforward. If an accented character is missing in ISO Latin 1 use the non accented version. There is no server support for direct ISO Latin 3 data for the time being.

  • Languages with native character set ISO Latin 4: same as ISO Latin 3 (transliterate to Latin 1).
  • Bulgarian, Macedonian, Serbian, Byelorussian, Ukrainian with native character set ISO Latin 5 (Cyrillic): same as ISO Latin 2.
  • The new server keywords are
    ISO5NAME and
    ISO5TITLE

  • Russian:
  • Data must be transliterated to ISO Latin 1. So far no unique system has been enforced but English transliteration standards have been used mostly. There is also the possibility to mail in Cyrillic data itself! If you want to mail us the correct Cyrillic version of a name or title now in ISO Latin 1 use the new server keywords
    RUSSIANNAME and
    RUSSIANTITLE

    These are expecting data in the KOI8-R character set, and not ISO Latin 5! Again the trick here is to encode everything as ISO Latin 1 but using for the right side the characters that are binary identical to the correct ones for KOI8-R. Example:
    RUSSIANNAME
    Tarkovsky, Andrei|ôaÒËÏ×ÓËÉÊ, áÎÄÒÅÊ|
    RUSSIANTITLE
    Chapayev (1996)|þÁÐÁÅ× (1996)|

    The right side here looks quite strange, but compiling the data is easy if you know Russian and use a KOI8-R font while working on the right side and a ISO Latin 1 font for the left side.

  • Arabic (ISO Latin 6): same as ISO Latin 3 (transliterate to Latin 1).
  • Modern Greek (ISO Latin 7): same as ISO Latin 2.
  • The new server keywords are
    ISO7NAME and
    ISO7TITLE

  • Hebrew (ISO Latin 8): same as ISO Latin 3 (transliterate to Latin 1).
  • Japanese:
  • Data must be transliterated to ISO Latin 1. So far no unique system has been enforced but the official transliteration scheme we are aiming at is modified Hepburn romanization. Circumflexes for long vowels are accepted since macrons are not available. Capitalization is lower case except for the first letter of the first word and proper names in titles.

  • Chinese (Mandarin/Cantonese):
  • Data must be transliterated to ISO Latin 1. So far no unique system has been enforced. Input by knowledgeable users is most welcome so we can look at defining a strict policy. If interested, mail me.

  • Indian languages and all others not yet discussed: same as ISO Latin 3 (transliterate to Latin 1).

Ideally all data should be presented using its native character sets/ pictograms. Technically this is not possible though with current widespread software for web access, e-mail and operating systems in general.

In the future there will be a new huge standardized 16 bit character set called Unicode. It will offer the capability to freely combine Japanese Kanji with ISO 1 text and Hindi, for example. We will use it as it becomes widely available and supported by the industry.

I hope you enjoy the new more accurate ISO 1 data we offer now and also use the new possibilities for data addition with ISO 2/5/7 and KOI8-R.


GENERAL ALTERNATIVE TITLES ARE COMING

by Michel Hafner

Until now only alternative titles in the languages of the co-producing countries were accepted. This policy was reasonable because

  • a firm basis of primary titles had to be compiled first before a flood of alternative titles in various languages can be added without creating chaos.
  • the old ASCII character set had to be replaced with the new ISO Latin 1 set so collecting large amounts of titles can be done using their native character set or a better approximation to it than ASCII.

The prerequisites for general alternative titles are now given and since demand for these is big we will introduce them within the next few weeks. The new server keyword and format will be announced in time. Until then the old policy is valid, so please do not start to mail in Swahili titles for US movies right now! :-)

If you have large collections of such titles (at least several hundred) that you would like to donate please mail me so we can optimize the transfer.


HOT SEARCHES

by Jon Reeves

Here's the most popular searches people have done lately, based on total pages for the week ending April 19.

Titles:

  1. 1. Star Wars (1977)
  2. 270. Saint, The (1997)
  3. 8. Romeo + Juliet (1996)
  4. 12. Batman & Robin (1997)
  5. 3. Jerry Maguire (1996)
  6. 4. English Patient, The (1996)
  7. 179. Liar Liar (1997)
  8. -. Grosse Pointe Blank (1997)
  9. 79. Devil's Own, The (1997)
  10. 10. Scream (1996)
  11. 18. Lost World: Jurassic Park, The (1997)
  12. -. Chasing Amy (1997)
  13. 7. Star Wars: Episode I (1999)
  14. 20. Pulp Fiction (1994)
  15. -. Anaconda (1997)
  16. 16. Empire Strikes Back, The (1980)
  17. 234. Fifth Element, The (1997)
  18. 5. Fargo (1996)
  19. 15. Independence Day (1996)
  20. 22. Return of the Jedi (1983)

The Star Wars juggernaut rolls on, but it's losing some steam as the films fade from the US screens; the whole series is only 2.5x the number 2 film now. Chasing Amy has dragged Clerks up from #154 to #32 and Mallrats from nowhere to #104. Titanic is at #21, up from #95; it should make the top 20 next time. Huh factor: #22 "Alles Glück dieser Erde" (1993); #49 Dis (1995); #56 "And Everything Nice" (1949). As always, if anyone can explain the sudden popularity of these obscure titles, I'm interested. [Note: since the mailing, I've learned that Dis was high on the "worst movies" list.]

People:

  1. 2. Pamela Anderson
  2. 1. Tom Cruise
  3. 3. Sharon Stone
  4. 49. Val Kilmer
  5. 21. Brad Pitt
  6. 8. Harrison Ford
  7. 80. Elisabeth Shue
  8. 6. Teri Hatcher
  9. 11. Leonardo DiCaprio
  10. 4. Demi Moore
  11. 14. Alyssa Milano
  12. 5. Kim Basinger
  13. 10. Sandra Bullock
  14. 9. Mel Gibson
  15. 12. Ralph Fiennes
  16. 17. Michelle Pfeiffer
  17. -. John Cusack
  18. -. Joey Lauren Adams
  19. 27. Helen Hunt
  20. 13. Bo Derek

The first tie, between Shue and Hatcher (and only one reference behind Ford). That won't last; Hatcher becoming the new Bond girl will raise her score, and Shue should drop as The Saint leaves screens. Otherwise, the usual suspects shuffle around, and Kilmer, Shue, Cusack, and Adams enter the top 20 on the strength of popular new releases. Lots of "halo effect" from Chasing Amy; even Jay (Jason Mewes) scores at #94. #38 Petra Verkaik seems to be the new pinup of the month, with her two titles at #41 and #101. Huh factor: #46 Ricardo Franco (I).


HOT MOVIES

by Col Needham

Movies opening in the US in March and April sorted by number of votes (to April 16):

  000000111340648.3 Return of the Jedi (1983)
  00000001155918.6 Private Parts (1997)
  10000001125016.5 Crash (1996)
  00000001233528.3 Liar Liar (1997)
  00000011133258.1 Saint, The (1997)
  00000111112777.0 Devil's Own, The (1997)
  0.0.0001242138.8 Grosse Pointe Blank (1997)
  0.0.0001331058.5 Love and Other Catastrophes (1996)
  0000000125938.5 Selena (1997)
  0....00017919.6 Chasing Amy (1997)

Movies opening in the US in March and April sorted by average votes (to April 16):

  0....00017919.6 Chasing Amy (1997)
  0.0.0001242138.8 Grosse Pointe Blank (1997)
  00000001155918.6 Private Parts (1997)
  0.0.0001331058.5 Love and Other Catastrophes (1996)
  0000000125938.5 Selena (1997)
  000000111340648.3 Return of the Jedi (1983)
  00000001233528.3 Liar Liar (1997)
  0..0001113568.1 Jerusalem (1996)
  00000011133258.1 Saint, The (1997)
  000.001203637.6 Inventing the Abbotts (1997)

IMDb IN THE NEWS

by Jon Reeves

Just a few of the traditional media outlets that have mentioned us lately:

Boston Globe. Curiocity. Tribune-Review (Pittsburgh area). TV-Movie (Germany). Web Week, twice. Internet Oggi (Italy). US News & World Report. NTT telephone directory. MSNBC. LA Times. Kansas City Star. Vanity Fair (not by name, alas). Discovery Channel. Yahoo! Internet Life (one of Lucy Lawless' favorite sites).

Watch for articles in: CNR Magazine (Spain)

We've also won several new awards. See selections from the gallery here.

Point/Lycos top movie site. UK Online Cool Site.

Our readers in the UK should vote for us in the UK Web Awards (and don't forget to use our UK mirror site).

Our good friend Greg Bulmash's WASHED-UPdate has its awards:

PC Magazine site of the day.

And it was mentioned in:

Courier-Mail (Brisbane Australia). US Magazine. Late Show News.

SOFTWARE CHANGES

by Col Needham

Traffic continues to increase across all our sites so we've recently doubled our hardware capacity at the main US site, housed at Exec-PC in Wisconsin.

We've made it much easier to locate the permanent URLs for bookmarking / linking to IMDb pages. A button labelled "Link to this page" appears at the bottom of most pages and will provide the direct URL. For more details please see our linking guide. Remember that linking to the IMDb from your own pages helps build awareness of the IMDb and is very much encouraged.

The navigation menu at the bottom of each of our pages has been enhanced to include a menu of useful and interesting destinations to help people navigate the site easily. Simply select your destination and hit the "Go" button (if you hit the button without making a selection, the system takes you to a random page from the list).

The posters section has been enhanced to include links to posters stored on other sites in addition to those stored locally. For example see the poster for The Saint (1997).

If you're in the mood for browsing titles at random or looking for a good movie to go out and see/rent, try our random title selector (use your browser's RELOAD function to jump to another random title).

The recent/upcoming movie releases section has been expanded to view the upcoming movies as far into the future as we cover so start booking those tickets for Christmas releases now!

A new version of the local UNIX interface to the database has been released with support for the ISO-Latin-1 character set change and for the distributors and crew completion lists.


DATABASE STATISTICS

by Jon Reeves

This is a regular section giving information about the current size and growth of the IMDb. We receive between 50,000 and 75,000 additions every week from users all over the world.

Big month for milestones, with all of the main statistics crossing a threshold:

   Number of filmography entries: 1,538,799
   Number of people covered:        423,633
   Number of movies covered:        104,149

   Size of the database (Mb):           135

Recent milestones:

  • 500 alternate version entries
  • 2,000 miscellaneous company entries
  • 4,000 literature list entries
  • 5,000 business list entries
  • 25,000 biographies
  • 40,000 composer entries
  • 45,000 cinematographer entries
  • 100,000 movies
  • 100,000 country entries
  • 400,000 people
  • 600,000 actor entries
  • 1,500,000 filmography entries

FUTURE DEVELOPMENTS

This is a regular section listing some enhancements we're currently looking at. Please bear in mind that some of these may take quite a while to come to fruition or even fail to materialize because the original volunteer decides not to proceed.

  • a separate list of films in production, with their current status.
  • outline list: a "one line" plot summary, short enough to display on the main title page.
  • a list of "influential scenes"... the scenes that launched a thousand spoofs, became the director's trademark, changed cinema forever, launched a star.
  • a locally installable MS-Windows interface to the database is under final testing for those of you who want to reduce your phone bills!
  • enhanced awards section for the database covering more international festivals, national film institutes etc.
  • general support for alternate titles in languages other than English and the language of the producing country(s).
  • a movie recommendation service that will use your vote records to suggest other movies you might enjoy. Initially available via an E-mail interface. Time to check you're up-to-date with your voting!

Academy Awards and Oscar are registered trademarks of the Academy of Motion Picture Arts and Sciences. UNIX and X Window System are registered trademarks of The Open Group. The WASHED-UPdate is a trademark of Greg Bulmash. All other trademarks are the property of their respective owners.