PhDing ;-)

woensdag, augustus 09, 2006

First Writing (CLEF Paper)

After it has finally been decided that I am going to CLEF, I have to get my
feed wet with writing. My first approaches (not surprisingly) were not all that great.

My first paragraphes of the Introduction were like this:
--------------------------------------
In the past the Information Retrieval community spent the most efforts
to enable the user to retrieve documents, which were available as text.
Because of a huge amounts of information exists in spoken documents
and other multimedia content it would be very benefitial for the user
if he was supplied with information from this type of content too.
--
Comment: The first sentence is kind of obvious to the audiance. So it could
be left out. The second part is also leading into wrong type of questions that
will be covered in the paper. It is a CLSR track so everybody should believe that
it is benefitial. It is also kind of obvious to other audiance.
--------------------------------------
Obviously there exists an impedence mismatch between the format of
the documents and the way the user enter his query. Because even if
he uses speech input to express his query, it is unlikely that he
actually means the acustic properties (i.e. pronaunciation) of his
input he is looking for. That is why it is a big issue to bridge the
gap of a query existing in a textual query language and the relevant
documents being stored as acustic waves.
--
Comment: These is also a very general statements. They don't get
investigated further in the paper. The imedence mismatch also was
inspired by my database background (RDBMS vs OO-Programming).
I think it does play a role and it is the motivation to use annotations
(we can't query collection of audio files, at least not online). Another
notion which could be deduced from here is the way of how users enter
there information request (query) into the computer. Specially in the
case of rather structured queries ("somewhere in the middle of the film")
this is not addressed so much to my tiny knowledge.
--------------------------------------

On the structure: My approach was generally first to do some blabla about the
current situation of current research (which I moreover have no glue
of). Then I wanted to introduce the "small" problem which we tried to
solve with PFTijah. Section 2 would contain the intro to PFTijah, its
application to the CLEF-CLSR topics and the procedure of simulating
Dutch people who request the information in dutch. Then I would
have conclude that ASR is benefitial together with MK/SUM and
presented the outline of (my) future work.

Roeland changed this to the introduction of the problem with multiple
annotations in the beginning. Afterwards in section 2 PFTijah and its application
to the task. In the conclusion he closed the "ring" in the way that he repeated what remains
open an what not. He also did remove a lot of to vague and off-topic statements
of mine.

Writing hand-craft: A lesson learned while writing was also to fix each
paragraph to a single statement (little topic). Attention must be paid not
to write on verious items in one paragraph. This makes the paper easier to
read as it is presented in clear units. It also helps to assess the correctness
of the logic used in the paper. This is the case because the punch line through
the paper could be much easier followed and analysed. Furthermore one could benchmark
each paragraph against its "heading". If it contains off-topic stuff it could
mean that:
  1. the paragraph should be split into two units of seperate meaning.
    Afterwards the punchline (at least in the section) should be re-assest.
  2. the additional content is just filling ("nice to have"). So delete it.
  3. the presentation of this aspect is actually useful. Thus the heading
    should be changed to union the touched items. The punchline should
    be re-investigated again
It might also be that two paragraphs (not necessarily following each other) should
be combined. This could be because their content is actually belonging to each other.
The punchline need then obviously to be reexaminied.