Slav Petrov and the Berkeley Parser

Slav Petrov's Google Talk on his Parser

.


Comments

OCaml objects wrap C++

Amended SRILM generateSentence with prepopulated context, fixed end-of-sentence termination, and wrapped lmclient into a new lmclass. Still need to implement Mauricio advice on Gc.properly -- compiler rejects things like

Gc.finalise self#destroy, or Gc.finalise ignore (Lmclient.destroy handle) -- during runtime!

The wrapped system now works and the new method Lmclient.complete_sentence maxwords context properly generates exact-size completions.

I also got the book “Q for Mortals” and continued to evaluate J; also got the educational APL from Dyalog, but it’s only for Windows.

Ah, and my C bindings handle is now an abstract

type handle

-- also thanks to Mauricio. I added things like null () value and is_null, returning a null value from C and checking it there, similarly to limited private types in Ada.
Comments

OCaml bindings

Continued making object-wrapping bindings for OCaml representing SRILM C++ classes. The caml-list advice from Mauricio and Filipp Monnier is cool.

In search of OCaml bindings, found PLplot, tried to install it in macports -- doesn’t work with gtk-osx +no_x11, discussed that on macports list and plplot-general. Basiclaly, in svn checkout of plplot, you have to say,

cmake cmake

-- the latter is a directory -- and it will generate enough to run

ccmake .

-- which will provide graphical displays.
Comments

J, kx/Q, APL

Stumbled upon kx/Q in a Stevey’s rant on “a portrait of a n00b,” containing the silliest quote about “OCaml, Haskell, and their ilk.” That brought me back to kx/Q perusal and a tutorial by Boroff, and also J primer -- which is a treat to launch and follow. Ken Iverson clearly had written it. I wish I can program new mobile platforms at 80 as he did.

Quote:
A word is a group of characters from the alphabet that has a meaning.
Comments

LM novelties

Found RandLM, based on Bloom filters -- championed by Broder too. I need LM generation for sentence completion -- this can be done rather easily with SRILM, but could be cool with other methods, e.g. associative LMs. Gathered another dozen papers on LMs, including WSME by Roni Rosenfeld, and others.

Comments

New NLP findings with OCaml

HunPoS is a part of speech tagger in OCaml, implementing HMM with suffix recognition. This is a superb foundation for sequence HMM! Has a fast C lib. Didn’t build with my new OCaml 3.11 without ocamlfindlib.cmxa -- make world.opt didn’t do it, neither did make opt.opt.

Also found SWIG for OCaml, looking really mature with C++, representing class objects as closures. Am itching to wrap SRILM systematically in SWIG.
Comments

OCaml binding to SRILM ngram works!

I’ve emailed it back to Andreas. My own version of pplFile redirects cerr to ostringstream instead of cout, then captures it, later it’s parsed by three_fourths in The Perp System -- since ngram -debug 1 outputs teh perplexities of the sentences on the 3rd, and then each 4th line after that.

I’m creating LM clients inside C++ and return integer handles to OCaml; one’s supposed to call lm_destroy for every lm_create, and in reverse order, and no creations after deletions. This corresponds to my use case, but surely can be generalized for better bindings, and switched from static LMClient *clients[] to std::vector<LMClient*>.
Comments

Evry Nth element of a list

For the general case, I’ve done this:

let each_nth n list = List.fold_left2
(fun acc a i -> if i mod n = 0 then a::acc else acc)
[] list (range (List.length list))

For each 4th, OlegFink suggested:

let rec fourth = function _::_::_::a::xs -> a::(fourth xs) | _ -> []
Comments

list list transpose

I’ve written the following lili (’a list list) transpose for my perp system:

let transpose lili = List.map (fun n -> List.map (fun li -> List.nth li n) lili) (range0 ((List.length (List.hd lili))-1))

The range0 function generates an integer list [0;1;...;n-1] and is quite trivial:

let range ?(from=1) upto =
let rec go from upto acc =
if from > upto then acc else go from (upto-1) (upto::acc)
in
go from upto []

let range0 = range ~from:0

-- it can be further simplified with F#-like |>.

But OlegFink’s solution is simply

let rec transpose = function []::_ -> [] | list -> List.map List.hd list :: transpose (List.map List.tl list)
Comments

Conditional compilation in OCaml

Thanks to Mauricio Fernandez on the IRC, got my with/without pgocaml setup working:

dataframe.cmo: %.cmo: %.ml
        ocamlfind ocamlc -pp "camlp4o Camlp4MacroParser.cmo -DONT_USE_POSTGRES" -c $< -o $@

-- lest uncommenting things, I simply define an unexisting symbol instead of the necessary. Then, in dataframe.ml:

let get ?fromfile () =
match fromfile with
| Some file -> load file
| None -> IFDEF USE_POSTGRES THEN percells_dataframe () ELSE failwith "no fatabase for you" END
Comments