Our homegrown Translation Memory Toolkit

Recognise translatable strings and insert available translations

A quick n dirty sort of translation memory toolkit, written for special applications that we faced at a2e.de, growing toward more general usability as more applications come.

1 Present

Here is who our programs are and what they currently do:

tratext_update.pl
analyse and upgrade our Deplate source text
  • guess where translatable chunks are
  • surround these with appropriate markers (as pre-determined by us, based on Deplate macro language),
  • extract text from already-marked chunks
  • insert what it finds into our database
  • insert into the text chunks for which translations are found
  • interactively resolve conflicts when invoked with option -v
tratext_xget.pl
exports to translatable file
  • extract strings from the database and store them in a form that is easy to translate
  • select a certain range of strings, based on specifiable criteria
tratext_xput.pl
import work of translator
  • from the translatable source file back into the database
  • use a simple source format, try to be error-tolerant
Tratext.pm
library used by the programs
CString.pm
lower level library used by Tratext.pm
TratextTest.pm
test suite, helping to ensure that our programm does what we expectit to do before we use it on real files and commit it to our Subversion repository
cjk2asc.tab
table for mapping symbols from multibyte to unibyte
cjk2asc_en.tab
table for mapping symbols from multibyte to English (where language-neutral option doesn’t work well)
Perl at A2e
the Perl environment in which this toolkit works
  • in particular the package A2E::db.pm is required

2 Future

  1. improve the tools to get our imminent work done
  2. improve the tools to get our next project done too; grow further as we go
  3. generalise, modularise, document, package – go down further the road of all programming projects
  4. add some fuzzy matching

3 Past

  • 2007-08-26 toolset enriched with finer functionality and test suite
  • dat: 2007-08-22; prs: phm: phm creates this directory after developping the first prototypes of this toolset

4 Related Reading

  • Japanese Pharma Document Translation Project jekeK8 – these documents contain much redundancy and yet need to be translated by several translators in a short time. After spending hours and days with regular expression replacement and other quite powerful tricks of semi-automated editing with GNU Emacs, Hartmut decided that writing a set of perl scripts for some tasks would save time. Thus the tratext prototypes were written, as yet for a rather specialised task, limited in scope.
  • Deplate – the source hypertext format in which the source documents of the jekeK8 project were written and which is currently the favored format at a2e.
  • local Deplate modules – these implement certain macros (in Ruby), such as ‘vok’ and ‘ml’, that are recognised and written by the tratext tools.
  • GNU Gettext: we may strive for more compatibility with some of the formats of this system, so as to make use of po-mode editing and the like.
  • Multilingual Hypertext specification attempt and previous software (which extracts strings from documents in a specially powerful lisp-based source format, but is of little help when we have to upgrade existing texts to a hypertext form of expression.
  • Translation Memory: links to resources on the net
[ ADP | MLHT | tmpl | mktdir | CSS | Skripte | Deplate | hypermail | tratext ]
Valid XHTML 1.0! Valid CSS! deplate
http://a2e.de/adv/tratext
© 2007-08-22 Hartmut PILCH