Angel Cruz-Roa: 2009

Tuesday, November 24, 2009

CIARP 2009 - Report

After having returned from CIARP2009 in Guadalajara, MX in which I present in poster modality the paper Visual Pattern Analysis in Histopathology Images Using Bag of Features and then to return to work and academic activities (including assisting the SIB-SIPAIM 2009) I describe what I'm experience both academically and personally in my first trip abroad.

Feedback

I must thank in first place to Francisco Gomez for preparation before of poster presentation in 30 seconds, it was really useful to present the idea in such a short time. Moreover, in the poster session I present on several occasions this work (I was lucky that everyone were from hispanoamerica, so it was in Spanish jeje: P). The main remarks were about the possibility of building the visual vocabulary automatically by a method of unsupervised clustering, as control or linking semantic information and the magnification of the images, and finally on the methodology designed for automatic annotation tasks.

Works of interes

Texture analysis methods and applications. Prof. Maria Petrou University of Cambridge, UK. This was an interesting tutorial about texture analysis and description methods, also she shows your book about this. This is very important for the relationship with histology images for tissue description.
We are Building a Topological Pyramid. Prof. Walter G. Kropatsch. Vienna University of Technology, Viena. This was other interesting tutorial about graph representation of objects in image processing for segmentation and others using topological and connectivity information in order to reduce this representation.
Randomized Probabilistic Latent Semantic Analysis for Scene Recognition. Erik Rodner and Joachim Denzler. This was an interesting work that use bag of feature and pLSA for image categorization in natural scene images.
Classifier Selection in a Family of Polyhedron Classifiers. Tetsuji Takahashi, Mineichi Kudo and Atsuyoshi Nakamura. An interesting paper where the authors proposed a classifiers for SVM where choose the decision space with polyhedrons that reduce the convex hull with relative better results.
Clustering Ensemble Method for Heterogeneous Partitions. Sandro Vega-Pons and José Ruiz-Shulcloper. This work is about clustering method addressed to ensemble for heterogeneous partitions. The application in hierarchical clustering can be explored.
Improved Online Support Vector Machines Spam Filtering Using String Kernels. Ola Amayri and Nizar Bouguila. Good paper about machine learning using string kernels and Transductive Support Vector Machines. The more interesting of this work is the exhaustive experimentation and configuration setup.
A New Incremental Algorithm for Overlapped Clustering. Airel Pérez Suárez, José Fco. Martínez Trinidad, Jesús A. Carrasco Ochoa, and José E. Medina Pagola. This is a good work about incremental clustering, very useful when we have large datasets and when a new data arrive, we don't want the clustering again. Closely related with our work and maybe a collaborative work is possible.

The procedings of CIARP2009 are available here.

Contacts

Ioannis A. Kakadiaris, Ph.D. He is Director of Computational Biomedicine Laboratory and Division of Bioimaging and Biocomputation Institute for Digital Informatics and Analysis at University of Houston, Houston, Texas, USA. He was the only keynote speaker who works in biomedical imaging. We spoke about cardiac image problems in 4D (spatial and time resolution trade-off) and the possibility of collaborative work and an invitation to Bioingenium Reseach Group for similiar research topics, to which he showed interest and possibilities under its schedule.

Lic. Airel Perez Suarez. He is a reseacher in computer science of Centro de Aplicaciones de Tecnologías Avanzadas (CENATAV). He works in data minning and information retrieval. He shows a paper about incremental algorithm for overlapped clustering and he is interested in test your algorithm with our histology images dataset and visual words for collaborative works.

Dr. José Ruiz Shulcloper. He is Director of Centro de Aplicaciones de Tecnologías Avanzadas (CENATAV) and president of the Cuban Association for Pattern Recognition. He invite us to start the fundation of Colombian Association for Pattern Recognition in order to enter at International Association of Pattern Recognition (IAPR). For this, he said that we can to start from an actual association related with pattern recognition (for example, Sociedad Colombiana de Computación), only we must send the information request for this purpose to your mail.

Erik Rodner. He is rearcher in chair of computer vision from Institute of Computer Science at University of Jena. He shows two works, one about image categorization using Bag of Features in natural scene images and object recognition using a visual feature combination with bag of features from 2D and 3D images.

Note: For future conferences is very important to carry presentation cards.

Photos

The Poster and me (academic evidence)

Pyramid of the sun, Teotihuacan. (evidence tourist)

El Lago, Chapultepec Forest. (like a picture-postcard)

Thursday, November 12, 2009

Final version of CIARP poster

Hi, finally I publish the final version of CIARP Poster. Excuse me the delay. I appreciate the comment, because tomorrow I going to print. The saturday is the travel!! :P. The flickr link to more specific annotation is here.

Sunday, October 25, 2009

Designing the poster for CIARP

Hi buddies, in a few days I hope to be in Guadalajara presenting a poster about the work that was accepted at 14th Iberoamerican Congress of Pattern Recognition. For this reason I must do it and I started to design it. So, I show the preliminar design to give me your opinion to improve it. Thank you very much.

The image is avaliable also in Flickr here.

Friday, October 23, 2009

How to evaluate the quality of clustering?

In last 3 days I am working in how to evaluate the clustering performance (or quality). The reason is that I need to determine the number of prototype blocks ("good" visual blocks of example) given a set of them for each category (e.g. nervous, muscle, etc.). For this, I have the similarity matrix of combined visual features (linear combination using weights founded by kernel aligment method) of image blocks.

I am using k-centers (also called k-medoids) method to find the k image blocks that can be a good representation of visual variability in each concept. The problem is: What k value is appropiate for select the representative blocks according the visual variability inside set of blocks images for a given concept?

One of the most popular methods to select the right value of k is by means of the silhouette coefficients.

The method to calculate the silhouette coefficient is:

For a given i point in a cluster A, the silhouette of i, s(i) is defined as follows:

s(i) = [ b(i) - a(i) ] / max { a(i) , b(i) }

where, a(i) is the average of dissimilarity between point i and all other points in A (the clusters to which i belongs) and b(i) is the average dissimilarity between point i and the points in the closest cluster to A, which is B in this case. Seek that -1<= s(i) <= 1. That means:

s(i) closest to 1, the object i is well classified
s(i) closest to 0, the object i is between two clusters
s(i) closest to -1, the object i is wrong classified

The average of all silhouettes in the data set S' is called the average silhouettes width for all points in the data set. The value S' will be denoted by S'(k), which is used for the selection of the right value of the number of clusters, k, by choosing that k for which S'(k) is as high as possible. The Silhoette Coefficient (SC) is then defined as follows:

SC = max { S'(k) }

Partial solution
The first, I had to adapt the silhouette coefficient method for calculate it from similarity matrix. The method was developed in matlab and can be downloaded here.

I did several experiments varying the k value for a specific concept, but I found that when k value increases, the SC value also.

The reason that found is that when a cluster have only a object, the a(i) value of s(i) formula is NaN, because there are not others objects in the same cluster. This, when use the next formula

SumDist := sum(distances(i,j) | i ~= j, for all j belongs to Ai)
NObjs := count( j | for all j belongs to Ai)

where, i is the object and belongs to Ai cluster, and j is the others objects. Then I calculate a(i) thus:

a(i) = SumDist/Nobjs

Then, the question is What must we do when Nobjs is cero? that means, How do take into account when a cluster have just one element?, How improve the silhouette measure using this?, Is it possible to include in some measure that penalizes when the number of objects per cluster is smaller than a given value?

Now my idea is vary the k-value until appears a cluster with just one element. In other words maximize the SC value and minimize the number-of-clusters-with-one-element.

I going try some experiments with this idea...

If you have some comments, if I'm doing something wrong or any idea to find a good value of k in an objective way, your comments will be welcome.

While, I will testing to finally show results of these experiments.

P.D.: Solve this problem can be useful also for summarize and 2D-visualization a large collection of images. I believe...

References
Kaufman, L. and P. Rousseeuw, 1990. Finding Groups in Data: An Introduction to ClusterAnalysis. John Wiley and Sons, London. ISBN: 10:0471878766.

Wednesday, October 14, 2009

Sparse representations

A "recent" paradigm to represent digital images has been used in signals previously and promises be the holy grail in computer vision by striking results. Then, the question is Why not study it?.

The first motivation is that a possible relation with my master's thesis exists.

In bag of features, we split the images in blocks, commonly called visual words, and then a feature description is done to represent these visual words. The process is performed in all images in a specific image collection. Then a visual codebook is built with more representative visual words in collection, an approach commonly used is by clustering, i.e. k-means. The visual codebook, or dictionary, is built and each image is represented by the occurrence of visual words according to codebook in image, the asignation is made by the most similar measure between visual word in image and a visual word of codebook.

In sparse representation, we choose a random blocks in an image, called dictionary D. Then, we want obtain a vector x that help to reconstruct the original image how a linear combination between them. The optimization problem is defined by sparse measure of zero norm and the best solution is given by the x vector most sparse. However, the D is not a square matrix and is indetermined problem with number of observations (cols) is greater than basis dimension (rows), so have many infinite solutions. The best solution is given by the x vector most sparse in norm zero, but it is a NP-hard problem. The sparse measure in norm one is a good aproximation and is the same solution in some cases of original optimization problem with advantage that is possible solve with a LP method (basis pursuit, matching pursuit, orthogonal matching pursuit, among others). With sparse solutions the dictionary is the best set of basis that represents the image content and more compact representation of image than fourier, wavelets, curvelets, etc.

I am working in this moment in this approach and how can help me in my master thesis... I hope :)

Tuesday, September 8, 2009

Installing Code::Codeblocks, wxWidgets and MingGW in WIndows

In recent weeks I return to develop in C++, but my lastest experience was in Visual C++ 6.0 using MFC library for my undergrade thesis. Now, I must to do some application in C++ but I need to develop in an opensource IDE with opensource and multiplataform GUI. In conclusion I just going to share my experience using the minimum effort way to develop any application in Windows. The software that use was:

First, we must install wxPack, this application install wxWidgets Compiled 2.8.9 (GUI library), wxAdditions and wxFormBuilder (GUI designer to wxWidgets).

Second, we must install Code::Blocks and when the instalation program ask us about wxWidgets path, the default path is "C:\SourceCode\Libraries\wxWidgets2.8".

Finally, let's go!, the framework is ready to use it! only you must to create a project and choice GUI Category and use wxWidgets project.

Angel Cruz-Roa