Thursday, 31 January 2008

Why ocaml and haskell are not for scientists?

Why ocaml and haskell is not for scientists.

I am currently postdoc, soon it will be a year. For my new project I decided to try to use ocaml instead of more traditional environment - Matlab.

But after a year reading documentation, books, trying to implement something, I decided to drop it and keep using matlab. If time permits I would keep an eye on the development of this languages, but I would not use it for scientific calculations in the nearest future.

Here are my points and I want someone to convince my that I am wrong - I would like to be convinced.

1) The choice of ocaml for the project was due to the fact that my new project has a lot of recursive functions and functional languages should handle it better than matlab.

2) Native integer precision on amd 64.

3) Speed.

I was really impressed with demos from the book "OCaml for Scientists" by Jon Harrop, however the price tag was too high, so I ended up getting it through inter library loan for the period 3 weeks. The book is worth reading, it gave me few very good ideas. But!! Most examples from the book are not repeatable. That is not scientific for me.

There is nothing in the book about the versions of the modules used or even OS. (It's actually debian linux)

I installed debian, but I still had a problem repeating examples.

Because 2D figures in the book generated using Mathematica rather than ocaml. Why do you think it has been done this way? Because there is no binding to gnuplot or simular lib for plotting. Actually there is, but it is not stable and has not been updated for a while. I was eagerly waiting for the end of Ocaml Summer Project as visualisation toolkit and mathematical framework looked fairly useful for me. It's already summer project 2008 announced, but the projects from osp2007 still in svn even not pre-alpha.

I think this is one of the major problems with ocaml - you can find very interesting projects or modules, but they mostly abandon, never made it ever as far as beta. And with lack of standards and documentation for modules I spend hours trying to install some necessary modules. As a former unix system administrator, I could install server with something like postfix+dovecot+samba+ldap with maia, but I can't install for example fftw module on linux and on mac os x. I was ready to tear my hair off. I think because ocaml community is small you rarely would find real cross platform modules there.

As a newbie I found it frustrating that I could not repeat completely any ocaml tutorial or book. As soon as you try to move from basic calculation, you are hitting  the wall trying to repeat example from the book or tutorial and getting some obscure message.

With haskell I was lucky enough to get hmatrix working, it does have a proper plotting and matrix manipulation. I even got as far as writing my own haar wavelet and started hennon map. Guess what? Hennon map in haskell is slow. Slower than perl and slower then C. But haskell community larger and more dynamic, possible due to the fact that haskell syntax standard, rather then current 3 styles of ocaml.

And because ocaml and haskell have strong types you will have to convert vector matrix to vector double if you want to plot it or to pass from gsl library to fft or another library.

My conclusion - I don't get paid for rewriting common modules and calculations in different languages. I get paid for inventing new thing, new methods.

Although ocaml and haskell interesting languages, they just not ready for scientific use. I didn't mean computer scientist, who actually get paid for writing software, but for scientists in more general sense - physics, signal processing, image processing. Also having good language is not enough to be useful. Creating compiler is a mandatory exercise for computer science undergraduate.

It is a community, modules and common standards that allow easy adoption of new language.

So far neither ocaml no haskell, not even ruby or python reached the level of perl. I am talking about CPAN. Although python and ruby getting close. So it might be worth trying perl for number crunching or draw back to C++ or C. It worth checking new technologies, but in the end it is a result that is required. Repeatable and published.

Update: After initial post Christophe did a lot of work improving fftw3 with new examples.


razo7777 said...

Have you heard of SAGE?

I've recently started learning python because SAGE is using it. It is a wrapper for e.g. maxima,gap,pari but also implements lots of things nativly (in python and cython for speed).
The project seems very dynamic and has releases every few weeks.

Colonna said...

Could you briefly describe the kind of project you wanted to implement with OCaml ? Which scientific domain, which kind of simulation ?


Fred Ross said...

I tried the Perl numerics route some years back. That way lies madness. If you have to hack out something that lies well in linear algebra, MATLAB/Octave remains your best friend.

That being said, I think Haskell could become a nice platform for this kind of thing, but it's going to look really, really different from MATLAB. Somehow the isomorphisms must all become explicit, yet automatically handled. This demands a rather thorough rewrite of the mathematical typeclasses, and some kind of autogeneration of instances of algorithms into particular types. It's possible, it just requires a very delicate touch in abstract algebra.

I had a friend who was inverting 10^6 x 10^6 sparse matrices in OCaml, but it was so special purpose he was writing from scratch. It served him well for that.

Personally, I wish gcc had an ALGOL 68 frontend. Now there was a language to do scientific computing...

wuzzyview said...

I'm a scientist looking to move away from Matlab. Numenta's software releases include python and all the libraries that have enabled them to move 99% of their development from Matlab to python (and they're a major machine learning research company But I was looking for something even more functional, e.g., Ocaml, so your experience is valuable to me.

Have you considered common lisp?

ChriS said...

Jon Harrop's book may not be fully up to date with the current development status of some libraries but that's a sign that these libraries are actively developed (and sometimes slightly refactored). Moreover I am sure that, if you ask him, he will send you updated versions of his codes.

there is no binding to gnuplot or simular lib for plotting.

This is not accurate: not only there is a binding for gnuplot (I do not know if it is the one you are referring to with Actually there is, but it is not stable and has not been updated for a while. but it is perfectly usable as it is even though it is undergoing some refactorisation -- it is not dead!) but also one for PlPlot (and other, i.e. for Imagemagick,...).

you can find very interesting projects or modules, but they mostly abandon

You are not specific here but many modules (e.g. C bindings) are not updated because they are stable! Although it is growing fast, the OCaml community is not large enough to necessarily have the modules you dream for ready to use. The community is lively and friendly and, if you try to develop your modules, you can count on it to respond quickly to your questions. (BTW, I do not think we have seen many questions of you on the mailing list which is clearly advertised in the "Resources" of the OCaml home page.)

with lack of standards and documentation for modules

Do you know that the documentation is in the .mli files? Also, you said you installed Debian, so many interesting modules are just an aptitude install away.

can't install for example fftw module on linux and on mac os x

FFTW3 bindings are in the works. If you had trouble, why did you not try to contact the author of these libs?

you will have to convert vector matrix to vector double

Not really sure what is your concrete problem here: numerical libraries usually use bigarrays and therefore work well with each other.

As a conclusion, it seems from your post that you were dissatisfied with not being able to repeat the examples on Jon's book (a frustration perfectly comprehensible) but it is not clear how much effort, if any, you put in trying to get some help -- e.g. on the mailing list or contacting the authors -- which maybe would have saved you from drawing conclusions a little to hastily without complete information...

Alex UK said...

to chris:
I did contact Jon Harrop - that's how I find out that he used Mathematica for two dimensional plots.
And I also submitted bug report for installing fftw3 - and I never heard a reply from author.
Eventually, I managed to compile most of the necessary libraries - I checked gsl, fftw, gnuplot, but I never managed to make all of them working with macosx & linux. Either one or two will work only on linux or on mac. And now I think it was not worth it.
My project requires a lot of signal processing - using wavelet packets and fftw as well as permutations.
I can see how I can use functional languages for this and for my old project, but for now I only watch ocaml & haskell communities as a hobby.
Lack of documentation is GLOBAL in ocaml.
Why do I need to go to mli file in order to find what the module is doing?
Why there is rarely any comments in Makefiles for modules?
I never manage to complete single one tutorial in ocaml (wiki, Programming Ocaml).
If you can point out on comprehensive tutorial for numerical calculations I would be grateful, and I promise to give it a trial.
It also will answer your comment about "everything using big arrays.
I would be grateful for the following example:
Generate chaotic time series - Hennon map for example. (and plot it :))
Apply traditional fftw analysis on it
Apply wavelet analysis
Plot all three - signal and decompositions.
I finally did half of it in haskell, but I found it wasn't worth it.
It is possible, that I expected too much from ocaml and was really disappointed, but I still think that ocaml is not ready for serious number crunching and unless some miracle happened will never be - the existing state of the language is not encouraging.

Colonna said...

I just want to answer this remark :

"Why do I need to go to mli file in order to find what the module is doing?"

Because if you need to know what a module is doing you should not be bothered with the implementation (the ml files).

If you want to see both, you have the html or LateX automatic documentation tool which does the job.

By the way, I am doing a general purpose simulation software in OCaml. I agree with some of your complains about lack of documentation (the OCaml team is small and very busy, why dont you help them ?). On the other hand I do think OCaml is well suited for expressing scientific problems.

I begun to gather solutions to the most common programming problems in OCaml

if this can help ....