Tuesday, April 29, 2014

A nice script to run GIZA++.

The use of GIZA++ is not really straightforward, especially as there is no official tutorial. Fortunately, Fabio Ticconi did a very nice guide. I took his script and rearranged it with a few variables to make it configurable (source and target languages, paths for inputs, outputs and commands). Here it is, along with the config file. It works with GIZA++ 1.0.7.

Monday, April 21, 2014

A small bug in Preview enables to underline/highlight with any color

I found a small bug in Mac OS X Mavericks Preview (version 7.0 (826.4)) yesterday, that enables to underline/highlight with any color. It's not so straightforward, but at least if you really really want to use more colors, you can.
I rewrite it here:
  1. Write any text, or draw a line or any other figure with color 1.
  2. Change its color to color 2.
  3. Select the highlight or underline tool.
  4. Do cmd-z. This should change the text/figure color back to color 1. (If it is not the case, make sure the text you have written is not selected.)
  5. You now have the underline/highlight tool selected with the color box accessible. Select your color (even if already selected), then highlight/underline.
Another way:
  1. Suppose you want to underline in blue. Highlight something in blue.
  2. Right-click on what you just highlighted, and make it underlined instead.
  3. Select the underline tool.
  4. Do cmd-z twice. This will remove the underline you just did.
  5. Underline what you want: it will be underlined in blue! :)
Steps 2 and 3 are actually interchangeable.
This was my Preview bug finding day :) Enjoy!

Tuesday, April 15, 2014

Europarl corpus v.7 en-fr word-aligned with GIZA++

Finally, I finished aligning the Europarl corpus with GIZA++. Since this took me several days, I thought some people would be happy the find directly the word-aligned version online (saving processor power consumption at the same time!). So here it is, along with the config file that produced it. The source language is English, the target language is French. I basically followed instructions given here (many thanks to the author!).

Wednesday, April 2, 2014

Install GIZA++ 1.0.7 on Mac OSX 10.9.2

GIZA++ was written for Linux. A few modifications of the code enabled me to compile it for Mac.
Here is how you do it:

Create and move to your install folder, then:

curl http://giza-pp.googlecode.com/files/giza-pp-v1.0.7.tar.gz -o giza-pp-v1.0.7.tar.gz
tar -xzvf giza-pp-v1.0.7.tar.gz
cd giza-pp
perl -pi -w -e 's/<tr1\//</g;' GIZA++-v2/* mkcls-v2/*
perl -pi -w -e 's/using namespace std::tr1;//g;' GIZA++-v2/* mkcls-v2/*
perl -pi -w -e 's/std::tr1:://g;' GIZA++-v2/* mkcls-v2/*
sed '36d' mkcls-v2/mystl.h > mkcls-v2/mystl.h.tmp
sed '50d' mkcls-v2/mystl.h.tmp > mkcls-v2/mystl.h
rm mkcls-v2/mystl.h.tmp

(The first 3 lines download and decompress the GIZA++ archive. The last 5 lines remove all the references to tr1.)