Aalto computer scientists in ECCV 2024
The European Conference on Computer Vision (ECCV) is a biennial conference in Computer Vision and Machine Learning, managed by the European Computer Vision Association (ECVA).
This year conference is organised on 29th September through 4th October, 2024
at MiCo Milano.
Accepted papers
In alphabetical order. Click the title to see the authors and the abstract. Links to the papers open on different website.
Authors
Yogesh Kumar, Pekka Marttinen
Abstract
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal interoperability. eCLIP integrates a heatmap processor and leverages mixup augmentation to efficiently utilize the scarce expert annotations, thus boosting the model's learning effectiveness. eCLIP is designed to be generally applicable to any variant of CLIP without requiring any modifications of the core architecture. Through detailed evaluations across several tasks, including zero-shot inference, linear probing, cross-modal retrieval, and Retrieval Augmented Generation (RAG) of radiology reports using a frozen Large Language Model, eCLIP showcases consistent improvements in embedding quality. The outcomes reveal enhanced alignment and uniformity, affirming eCLIP's capability to harness high-quality annotations for enriched multi-modal analysis in the medical imaging domain.
Authors
Juuso Korhonen, Goutham Rangu, Hamed R. Tavakoli, Juho Kannala
Abstract
We propose an application of online hard sample mining for efficient training of Neural Radiance Fields (NeRF). NeRF models produce state-of-the-art quality for many 3D reconstruction and rendering tasks but require substantial computational resources. The encoding of the scene information within the NeRF network parameters necessitates stochastic sampling. We observe that during the training, a major part of the compute time and memory usage is spent on processing already learnt samples, which no longer affect the model update significantly. We identify the backward pass on the stochastic samples as the computational bottleneck during the optimization. We thus perform the first forward pass in inference mode as a relatively low-cost search for hard samples. This is followed by building the computational graph and updating the NeRF network parameters using only the hard samples. To demonstrate the effectiveness of the proposed approach, we apply our method to Instant-NGP, resulting in significant improvements of the view-synthesis quality over the baseline (1 dB improvement on average per training time, or 2x speedup to reach the same PSNR level) along with approx. 40% memory savings coming from using only the hard samples to build the computational graph. As our method only interfaces with the network module, we expect it to be widely applicable.
Authors
Zakaria Laskar, Iaroslav Melekhov, Assia Benbihi, Shuzhe Wang, Juho Kannala
Abstract
Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible with the memory budget of several applications. One solution to reduce the scene memory size is map compression by removing certain 3D points and descriptor quantization. This achieves high compression but leads to performance drop due to information loss. To address the memory performance trade-off, we train a light-weight scene-specific auto-encoder network that performs descriptor quantization-dequantization in an end-to-end differentiable manner updating both product quantization centroids and network parameters through back-propagation. In addition to optimizing the network for descriptor reconstruction, we encourage it to preserve the descriptor-matching performance with margin-based metric loss functions. Results show that for a local descriptor memory of only 1MB, the synergistic combination of the proposed network and map compression achieves the best performance on the Aachen Day-Night compared to existing compression methods.
Authors
Heikki Rantala, Petri Leskinen, Lilli Peura and Eero Hyvönen.
Abstract
This paper presents how relations or associations between entities, such as persons and places in cultural heritage knowledge graphs, can be searched and analyzed using faceted search and visualizations. Faceted search using well-formed ontologies allows search and comparison of relative numbers in associations of groups of entities, such as artists from different countries, and can reveal patterns of interest in the data. This papers presents examples of how this can be done in practice, how the associations can be conceptualized in different ways that affect the performance of the search, and how the associations can be analyzed. The concept of faceted relational search is examined through case studies including searching relations in collections of biographies from various European countries, relations in the Union List of Artist Names (ULAN) thesaurus, and relations formed by links between Wikipedia pages of persons.
Department of Computer Science
cs.aalto.fi
School of Science
Science for tomorrow’s technology, innovations and businesses
- Published:
- Updated: