Publications

Major works


Chris Careaga and Yağız Aksoy
ACM Transactions on Graphics (Proc. SIGGRAPH Asia), 2024
Intrinsic image decomposition aims to separate the surface reflectance and the effects from the illumination given a single photograph. Due to the complexity of the problem, most prior works assume a single-color illumination and a Lambertian world, which limits their use in illumination-aware image editing applications. In this work, we separate an input image into its diffuse albedo, colorful diffuse shading, and specular residual components. We arrive at our result by gradually removing first the single-color illumination and then the Lambertian-world assumptions. We show that by dividing the problem into easier sub-problems, in-the-wild colorful diffuse shading estimation can be achieved despite the limited ground-truth datasets. Our extended intrinsic model enables illumination-aware analysis of photographs and can be used for image editing applications such as specularity removal and per-pixel white balancing.
@ARTICLE{careagaColorful,
author={Chris Careaga and Ya\u{g}{\i}z Aksoy},
title={Colorful Diffuse Intrinsic Image Decomposition in the Wild},
journal={ACM Trans. Graph.},
year={2024},
volume = {43},
number = {6},
articleno = {178},
numpages = {12},
}

Sebastian Dille*, Chris Careaga*, and Yağız Aksoy
ECCV, 2024
The low dynamic range (LDR) of common cameras fails to capture the rich contrast in natural scenes, resulting in loss of color and details in saturated pixels. Reconstructing the high dynamic range (HDR) of luminance present in the scene from single LDR photographs is an important task with many applications in computational photography and realistic display of images. The HDR reconstruction task aims to infer the lost details using the context present in the scene, requiring neural networks to understand high-level geometric and illumination cues. This makes it challenging for data-driven algorithms to generate accurate and high-resolution results. In this work, we introduce a physically-inspired remodeling of the HDR reconstruction problem in the intrinsic domain. The intrinsic model allows us to train separate networks to extend the dynamic range in the shading domain and to recover lost color details in the albedo domain. We show that dividing the problem into two simpler sub-tasks improves performance in a wide variety of photographs.
@INPROCEEDINGS{dilleIntrinsicHDR,
author={Sebastian Dille and Chris Careaga and Ya\u{g}{\i}z Aksoy},
title={Intrinsic Single-Image HDR Reconstruction},
booktitle={Proc. ECCV},
year={2024},
}

Sebastian Dille, Ari Blondal, Sylvain Paris, and Yağız Aksoy
ECCV Workshops, 2024
Class-agnostic image segmentation is a crucial component in automating image editing workflows, especially in contexts where object selection traditionally involves interactive tools. Existing methods in the literature often adhere to top-down formulations, following the paradigm of class-based approaches, where object detection precedes per-object segmentation. In this work, we present a novel bottom-up formulation for addressing the class-agnostic segmentation problem. We supervise our network directly on the projective sphere of its feature space, employing losses inspired by metric learning literature as well as losses defined in a novel segmentation-space representation. The segmentation results are obtained through a straightforward mean-shift clustering of the estimated features. Our bottom-up formulation exhibits exceptional generalization capability, even when trained on datasets designed for class-based segmentation. We further showcase the effectiveness of our generic approach by addressing the challenging task of cell and nucleus segmentation. We believe that our bottom-up formulation will offer valuable insights into diverse segmentation challenges in the literature.
@INPROCEEDINGS{dilleBottomup,
author={Sebastian Dille and Ari Blondal and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={A Bottom-Up Approach to Class-Agnostic Image Segmentation},
booktitle={Proc. ECCV Workshop},
year={2024},
}

S. Mahdi H. Miangoleh, Mahesh Reddy, and Yağız Aksoy
SIGGRAPH, 2024
Existing methods for scale-invariant monocular depth estimation (SI MDE) often struggle due to the complexity of the task, and limited and non-diverse datasets, hindering generalizability in real-world scenarios. This is while shift-and-scale-invariant (SSI) depth estimation, simplifying the task and enabling training with abundant stereo datasets achieves high performance. We present a novel approach that leverages SSI inputs to enhance SI depth estimation, streamlining the network's role and facilitating in-the-wild generalization for SI depth estimation while only using a synthetic dataset for training. Emphasizing the generation of high-resolution details, we introduce a novel sparse ordinal loss that substantially improves detail generation in SSI MDE, addressing critical limitations in existing approaches. Through in-the-wild qualitative examples and zero-shot evaluation we substantiate the practical utility of our approach in computational photography applications, showcasing its ability to generate highly detailed SI depth maps and achieve generalization in diverse scenarios.
@INPROCEEDINGS{miangolehSIDepth,
author={S. Mahdi H. Miangoleh and Mahesh Reddy and Ya\u{g}{\i}z Aksoy},
title={Scale-Invariant Monocular Depth Estimation via SSI Depth},
booktitle={Proc. SIGGRAPH},
year={2024},
}

Chris Careaga and Yağız Aksoy
ACM Transactions on Graphics, 2023
Intrinsic decomposition is a fundamental mid-level vision problem that plays a crucial role in various inverse rendering and computational photography pipelines. Generating highly accurate intrinsic decompositions is an inherently under-constrained task that requires precisely estimating continuous-valued shading and albedo. In this work, we achieve high-resolution intrinsic decomposition by breaking the problem into two parts. First, we present a dense ordinal shading formulation using a shift- and scale-invariant loss in order to estimate ordinal shading cues without restricting the predictions to obey the intrinsic model. We then combine low- and high-resolution ordinal estimations using a second network to generate a shading estimate with both global coherency and local details. We encourage the model to learn an accurate decomposition by computing losses on the estimated shading as well as the albedo implied by the intrinsic model. We develop a straightforward method for generating dense pseudo ground truth using our models predictions and multi-illumination data, enabling generalization to in-the-wild imagery. We present exhaustive qualitative and quantitative analysis of our predicted intrinsic components against state-of-the-art methods. Finally, we demonstrate the real-world applicability of our estimations by performing otherwise difficult editing tasks such as recoloring and relighting.
@ARTICLE{careagaIntrinsic,
author={Chris Careaga and Ya\u{g}{\i}z Aksoy},
title={Intrinsic Image Decomposition via Ordinal Shading},
journal={ACM Trans. Graph.},
year={2023},
volume = {43},
number = {1},
articleno = {12},
numpages = {24},
}

Chris Careaga, S. Mahdi H. Miangoleh, and Yağız Aksoy
SIGGRAPH Asia, 2023
Despite significant advancements in network-based image harmonization techniques, there still exists a domain disparity between typical training pairs and real-world composites encountered during inference. Most existing methods are trained to reverse global edits made on segmented image regions, which fail to accurately capture the lighting inconsistencies between the foreground and background found in composited images. In this work, we introduce a self-supervised illumination harmonization approach formulated in the intrinsic image domain. First, we estimate a simple global lighting model from mid-level vision representations to generate a rough shading for the foreground region. A network then refines this inferred shading to generate a harmonious re-shading that aligns with the background scene. In order to match the color appearance of the foreground and background, we utilize ideas from prior harmonization approaches to perform parameterized image edits in the albedo domain. To validate the effectiveness of our approach, we present results from challenging real-world composites and conduct a user study to objectively measure the enhanced realism achieved compared to state-of-the-art harmonization methods.
@INPROCEEDINGS{careagaCompositing,
author={Chris Careaga and S. Mahdi H. Miangoleh and Ya\u{g}{\i}z Aksoy},
title={Intrinsic Harmonization for Illumination-Aware Compositing},
booktitle={Proc. SIGGRAPH Asia},
year={2023},
}

S. Mahdi H. Miangoleh, Zoya Bylinskii, Eric Kee, Eli Shechtman, and Yağız Aksoy
CVPR, 2023
Common editing operations performed by professional photographers include the cleanup operations: de-emphasizing distracting elements and enhancing subjects. These edits are challenging, requiring a delicate balance between manipulating the viewer's attention while maintaining photo realism. While recent approaches can boast successful examples of attention attenuation or amplification, most of them also suffer from frequent unrealistic edits. We propose a realism loss for saliency-guided image enhancement to maintain high realism across varying image types, while attenuating distractors and amplifying objects of interest. Evaluations with professional photographers confirm that we achieve the dual objective of realism and effectiveness, and outperform the recent approaches on their own datasets, while requiring a smaller memory footprint and runtime. We thus offer a viable solution for automating image enhancement and photo cleanup operations.
@INPROCEEDINGS{Miangoleh2023Realistic,
author={S. Mahdi H. Miangoleh and Zoya Bylinskii and Eric Kee and Eli Shechtman and Ya\u{g}{\i}z Aksoy},
title={Realistic Saliency Guided Image Enhancement},
journal={Proc. CVPR},
year={2023},
}

Sepideh Sarajian Maralan, Chris Careaga, and Yağız Aksoy
CVPR, 2023
Flash is an essential tool as it often serves as the sole controllable light source in everyday photography. However, the use of flash is a binary decision at the time a photograph is captured with limited control over its characteristics such as strength or color. In this work, we study the computational control of the flash light in photographs taken with or without flash. We present a physically motivated intrinsic formulation for flash photograph formation and develop flash decomposition and generation methods for flash and no-flash photographs, respectively. We demonstrate that our intrinsic formulation outperforms alternatives in the literature and allows us to computationally control flash in in-the-wild images.
@INPROCEEDINGS{Maralan2023Flash,
author={Sepideh Sarajian Maralan and Chris Careaga and Ya\u{g}{\i}z Aksoy},
title={Computational Flash Photography through Intrinsics},
journal={Proc. CVPR},
year={2023},
}

S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, and Yağız Aksoy
CVPR, 2021
Neural networks have shown great abilities in estimating depth from a single image. However, the inferred depth maps are well below one-megapixel resolution and often lack fine-grained details, which limits their practicality. Our method builds on our analysis on how the input resolution and the scene structure affects depth estimation performance. We demonstrate that there is a trade-off between a consistent scene structure and the high-frequency details, and merge low- and high-resolution estimations to take advantage of this duality using a simple depth merging network. We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details to the final result. We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail using a pre-trained model.
@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}

Theo Thonat, Yağız Aksoy, Miika Aittala, Sylvain Paris, Frédo Durand, and George Drettakis
Computer Graphics Forum (Proc. EGSR), 2021
Image-Based Rendering allows users to easily capture a scene using a single camera and then navigate freely with realistic results. However, the resulting renderings are completely static, and dynamic effects – such as fire, waterfalls or small waves – cannot be reproduced. We tackle the challenging problem of enabling free-viewpoint navigation including such stationary dynamic effects, but still maintaining the simplicity of casual capture. Using a single camera – instead of previous complex synchronized multi-camera setups – means that we have unsynchronized videos of the dynamic effect from multiple views, making it hard to blend them when synthesizing novel views. We present a solution that allows smooth free-viewpoint video-based rendering (VBR) of such scenes using temporal Laplacian pyramid decomposition video, enabling spatio-temporal blending. For effects such as fire and waterfalls, that are semi-transparent and occupy 3D space, we first estimate their spatial volume. This allows us to create per-video geometries and alpha-matte videos that we can blend using our frequency-dependent method. We also extend Laplacian blending to the temporal dimension to remove additional temporal seams. We show results on scenes containing fire, waterfalls or rippling waves at the seaside, bringing these scenes to life.
@ARTICLE{Thonat2021,
author={Thonat, Theo and Aksoy, Ya\u{g}{\i}z and Aittala, Miika and Paris, Sylvain and Durand, Fr\'edo and Drettakis, George},
title={Video-Based Rendering of Dynamic Stationary Environments from Unsynchronized Inputs},
journal={Computer Graphics Forum (Proc. EGSR)},
year={2021},
volume = {40},
number = {4}
}

Jingwei Tang, Yağız Aksoy, Cengiz Öztireli, Markus Gross and Tunç Ozan Aydın
CVPR, 2019
The goal of natural image matting is the estimation of opacities of a user-defined foreground object that is essential in creating realistic composite imagery. Natural matting is a challenging process due to the high number of unknowns in the mathematical modeling of the problem, namely the opacities as well as the foreground and background layer colors, while the original image serves as the single observation. In this paper, we propose the estimation of the layer colors through the use of deep neural networks prior to the opacity estimation. The layer color estimation is a better match for the capabilities of neural networks, and the availability of these colors substantially increase the performance of opacity estimation due to the reduced number of unknowns in the compositing equation. A prominent approach to matting in parallel to ours is called sampling-based matting, which involves gathering color samples from known-opacity regions to predict the layer colors. Our approach outperforms not only the previous hand-crafted sampling algorithms, but also current data-driven methods. We hence classify our method as a hybrid sampling- and learning-based approach to matting, and demonstrate the effectiveness of our approach through detailed ablation studies using alternative network architectures.
@INPROCEEDINGS{samplenet,
author={Tang, Jingwei and Aksoy, Ya\u{g}{\i}z and \"Oztireli, Cengiz and Gross, Markus and Ayd{\i}n, Tun\c{c} Ozan},
booktitle={Proc. CVPR},
title={Learning-based Sampling for Natural Image Matting},
year={2019},
}

Yağız Aksoy, Tae-Hyun Oh, Sylvain Paris, Marc Pollefeys and Wojciech Matusik
ACM Transactions on Graphics (Proc. SIGGRAPH), 2018
Accurate representation of soft transitions between image regions is essential for high-quality image editing and compositing. Current techniques for generating such representations depend heavily on interaction by a skilled visual artist, as creating such accurate object selections is a tedious task. In this work, we introduce semantic soft segments, a set of layers that correspond to semantically meaningful regions in an image with accurate soft transitions between different objects. We approach this problem from a spectral segmentation angle and propose a graph structure that embeds texture and color features from the image as well as higher-level semantic information generated by a neural network. The soft segments are generated via eigendecomposition of the carefully constructed Laplacian matrix fully automatically. We demonstrate that otherwise complex image editing tasks can be done with little effort using semantic soft segments.
@ARTICLE{sss,
author={Ya\u{g}{\i}z Aksoy and Tae-Hyun Oh and Sylvain Paris and Marc Pollefeys and Wojciech Matusik},
title={Semantic Soft Segmentation},
journal={ACM Trans. Graph. (Proc. SIGGRAPH)},
year={2018},
pages = {72:1-72:13},
volume = {37},
number = {4}
}

Yağız Aksoy, Changil Kim, Petr Kellnhofer, Sylvain Paris, Mohamed Elgharib, Marc Pollefeys and Wojciech Matusik
ECCV, 2018
Illumination is a critical element of photography and is essential for many computer vision tasks. Flash light is unique in the sense that it is a widely available tool for easily manipulating the scene illumination. We present a dataset of thousands of ambient and flash illumination pairs to enable studying flash photography and other applications that can benefit from having separate illuminations. Different than the typical use of crowdsourcing in generating computer vision datasets, we make use of the crowd to directly take the photographs that make up our dataset. As a result, our dataset covers a wide variety of scenes captured by many casual photographers. We detail the advantages and challenges of our approach to crowdsourcing as well as the computational effort to generate completely separate flash illuminations from the ambient light in an uncontrolled setup. We present a brief examination of illumination decomposition, a challenging and underconstrained problem in flash photography, to demonstrate the use of our dataset in a data-driven approach.
@INPROCEEDINGS{flashambient,
author={Ya\u{g}{\i}z Aksoy and Changil Kim and Petr Kellnhofer and Sylvain Paris and Mohamed Elgharib and Marc Pollefeys and Wojciech Matusik},
booktitle={Proc. ECCV},
title={A Dataset of Flash and Ambient Illumination Pairs from the Crowd},
year={2018},
}

Alexandre Kaspar, Geneviève Patterson, Changil Kim, Yağız Aksoy, Wojciech Matusik and Mohamed Elgharib
ACM CHI Conference on Human Factors in Computing Systems, 2018
In this work, we propose two ensemble methods leveraging a crowd workforce to improve video annotation, with a focus on video object segmentation. Their shared principle is that while individual candidate results may likely be insufficient, they often complement each other so that they can be combined into something better than any of the individual results - the very spirit of collaborative working. For one, we extend a standard polygon-drawing interface to allow workers to annotate negative space, and combine the work of multiple workers instead of relying on a single best one as commonly done in crowdsourced image segmentation. For the other, we present a method to combine multiple automatic propagation algorithms with the help of the crowd. Such combination requires an understanding of where the algorithms fail, which we gather using a novel coarse scribble video annotation task. We evaluate our ensemble methods, discuss our design choices for them, and make our web-based crowdsourcing tools and results publicly available.
@INPROCEEDINGS{crowdensembles,
author={Alexandre Kaspar and Genevi\`eve Patterson and Changil Kim and Ya\u{g}{\i}z Aksoy and Wojciech Matusik and Mohamed Elgharib},
title={Crowd-Guided Ensembles: How Can We Choreograph Crowd Workers for Video Segmentation?},
booktitle={Proc. ACM CHI},
year={2018},
}

Yağız Aksoy, Tunç Ozan Aydın and Marc Pollefeys
CVPR, 2017 (spotlight)
We present a novel, purely affinity-based natural image matting algorithm. Our method relies on carefully defined pixel-to-pixel connections that enable effective use of information available in the image and the trimap. We control the information flow from the known-opacity regions into the unknown region, as well as within the unknown region itself, by utilizing multiple definitions of pixel affinities. This way we achieve significant improvements on matte quality near challenging regions of the foreground object. Among other forms of information flow, we introduce color-mixture flow, which builds upon local linear embedding and effectively encapsulates the relation between different pixel opacities. Our resulting novel linear system formulation can be solved in closed-form and is robust against several fundamental challenges in natural matting such as holes and remote intricate structures. While our method is primarily designed as a standalone natural matting tool, we show that it can also be used for regularizing mattes obtained by various sampling-based methods. Our evaluation using the public alpha matting benchmark suggests a significant performance improvement over the state-of-the-art.
@INPROCEEDINGS{ifm,
author={Aksoy, Ya\u{g}{\i}z and Ayd{\i}n, Tun\c{c} Ozan and Pollefeys, Marc},
booktitle={Proc. CVPR},
title={Designing Effective Inter-Pixel Information Flow for Natural Image Matting},
year={2017},
}

Yağız Aksoy, Tunç Ozan Aydın, Aljoša Smolić and Marc Pollefeys
ACM Transactions on Graphics, 2017
We present a new method for decomposing an image into a set of soft color segments, which are analogous to color layers with alpha channels that have been commonly utilized in modern image manipulation software. We show that the resulting decomposition serves as an effective intermediate image representation, which can be utilized for performing various, seemingly unrelated image manipulation tasks. We identify a set of requirements that soft color segmentation methods have to fulfill, and present an in-depth theoretical analysis of prior work. We propose an energy formulation for producing compact layers of homogeneous colors and a color refinement procedure, as well as a method for automatically estimating a statistical color model from an image. This results in a novel framework for automatic and high-quality soft color segmentation, which is efficient, parallelizable, and scalable. We show that our technique is superior in quality compared to previous methods through quantitative analysis as well as visually through an extensive set of examples. We demonstrate that our soft color segments can easily be exported to familiar image manipulation software packages and used to produce compelling results for numerous image manipulation applications without forcing the user to learn new tools and workflows.
@ARTICLE{scs,
author={Ya\u{g}{\i}z Aksoy and Tun\c{c} Ozan Ayd{\i}n and Aljo\v{s}a Smoli\'{c} and Marc Pollefeys},
title={Unmixing-Based Soft Color Segmentation for Image Manipulation},
journal={ACM Trans. Graph.},
year={2017},
pages = {19:1-19:19},
volume = {36},
number = {2}
}

Yağız Aksoy, Tunç Ozan Aydın, Marc Pollefeys and Aljoša Smolić
ACM Transactions on Graphics, 2016
Due to the widespread use of compositing in contemporary feature films, green-screen keying has become an essential part of post-production workflows. To comply with the ever-increasing quality requirements of the industry, specialized compositing artists spend countless hours using multiple commercial software tools, while eventually having to resort to manual painting because of the many shortcomings of these tools. Due to the sheer amount of manual labor involved in the process, new green-screen keying approaches that produce better keying results with less user interaction are welcome additions to the compositing artist's arsenal. We found that --- contrary to the common belief in the research community --- production-quality green-screen keying is still an unresolved problem with its unique challenges. In this paper, we propose a novel green-screen keying method utilizing a new energy minimization-based color unmixing algorithm. We present comprehensive comparisons with commercial software packages and relevant methods in literature, which show that the quality of our results is superior to any other currently available green-screen keying solution. Importantly, using the proposed method, these high-quality results can be generated using only one-tenth of the manual editing time that a professional compositing artist requires to process the same content having all previous state-of-the-art tools at his disposal.
@ARTICLE{keying,
author={Ya\u{g}{\i}z Aksoy and Tun\c{c} Ozan Ayd{\i}n and Marc Pollefeys and Aljo\v{s}a Smoli\'{c}},
title={Interactive High-Quality Green-Screen Keying via Color Unmixing},
journal={ACM Trans. Graph.},
year={2016},
volume = {35},
number = {5},
pages = {152:1--152:12},
}

Theses


Obumneme Stanley Dukor
MSc Thesis, Simon Fraser University, 2024
AI research is transforming creative tasks, with advancements in AI tools rapidly changing post-production expectations. However, the development of these technologies is mostly driven by technologists, often without involving the creatives who will use them. This thesis presents the development of a Computational Photography Research Studio aimed at bridging this gap. The goal is to create a practical and flexible studio setup that allows collaboration between creatives and researchers, allowing production and research to occur simultaneously. This new type of research involves stakeholders, like filmmakers, to ensure the research addresses their needs and benefits creative professionals. The studio setup includes portable production cameras and lighting, enabling the capture of high-quality live-action footage and datasets necessary for developing computational photography algorithms for post-production. This environment aims to direct AI research to better serve the filmmaking community, ultimately enhancing the quality of visual storytelling.
@MASTERSTHESIS{studio-msc,
author={Obumneme Stanley Dukor},
title={Setting up a Computational Photography Research Studioh},
year={2024},
school={Simon Fraser University},
}

Mahesh Kumar Krishna Reddy
MSc Thesis, Simon Fraser University, 2022
Training a single network for high resolution and geometrically consistent monocular depth estimation is challenging due to varying scene complexities in the real world. To address this, we present a dual depth estimation setup to decompose the estimations into ordinal and metric depth. The goal of ordinal depth estimation is to leverage novel ordinal losses with relaxed geometric constraints to model local and global ordinal relations for capturing better high-frequency depth details and scene structure. However, ordinal depth inherently lacks geometric structure, and to resolve this, we introduce a metric depth estimation method to enforce geometric constraints on the prior ordinal depth estimations. The estimated scaleinvariant metric depth achieves high resolution and is geometrically consistent in generating meaningful 3D point cloud representation for scene reconstruction. We demonstrate the effectiveness of our ordinal and metric networks by performing zero-shot and in-the-wild depth evaluations with state-of-the-art depth estimation networks.
@MASTERSTHESIS{mmd-msc,
author={Mahesh Kumar Krishna Reddy},
title={Metric Monocular Reconstruction through Ordinal Depth},
year={2022},
school={Simon Fraser University},
}

Sepideh Sarajian Maralan
MSc Thesis, Simon Fraser University, 2022
The majority of common cameras have an integrated flash that improves lighting in a variety of situations, particularly in low-light environments. Before capturing an image, the photographer must make a decision regarding the usage of flash. However, flash strength cannot be adjusted once it has been utilised in an image. In this work, we target two application scenarios in computational flash photography: decomposition of a flash photograph into its illumination components and generating the flash illumination from a given single no-flash photograph. Two distinct approaches based on image-to-image transfer and intrinsic decomposition with the use of convolutional neural networks are employed to address these tasks. An additional network boosts and upscales the estimated results to generate the final illuminations. Key advantages of our approach include the preparation of a large flash/no-flash dataset and presenting models based on state-of-the-art methods to address subtasks specific to our problem.
@MASTERSTHESIS{cfp-msc,
author={Sepideh Sarajian Maralan},
title={Computational Flash Photography},
year={2022},
school={Simon Fraser University},
}

Seyed Mahdi Hosseini Miangoleh
MSc Thesis, Simon Fraser University, 2022
Convolutional neural networks have shown a remarkable ability to estimate depth from a single image. However, the estimated depth maps are low resolution due to network structure and hardware limitations, only showing the overall scene structure and lacking fine details, which limits their applicability. We demonstrate that there is a trade-off between the consistency of the scene structure and the high-frequency details concerning input content and resolution. Building upon this duality, we present a double estimation framework to improve the depth estimation of the whole image and a patch selection step to add more local details. Our approach obtains multi-megapixel depth estimations with sharp details by merging estimations at different resolutions based on image content. A key strength of our approach is that we can employ any off-the-shelf pre-trained CNN-based monocular depth estimation model without requiring further finetuning.
@MASTERSTHESIS{bmd-msc,
author={Seyed Mahdi Hosseini Miangoleh},
title={Boosting Monocular Depth Estimation to High Resolution},
year={2022},
school={Simon Fraser University},
}

Yağız Aksoy
PhD Thesis, ETH Zurich, 2019
@phdthesis{ssi,
author={Ya\u{g}{\i}z Aksoy},
title={Soft Segmentation of Images},
year={2019},
school={ETH Zurich},
}

Yağız Aksoy
Master's Thesis, Middle East Technical University, 2013
With the increase in the number and computational power of commercial mobile devices like smart phones and tablet computers, augmented reality applications are gaining more and more volume. In order to augment virtual objects effectively in real scenes, pose of the camera should be estimated with high precision and speed. Today, most of the mobile devices feature cameras and inertial measurement units which carry information on change in position and attitude of the camera. In this thesis, utilization of inertial sensors on mobile devices in aiding visual pose estimation is studied. Error characteristics of the inertial sensors on the utilized mobile device are analyzed. Gyroscope readings are utilized for aiding 2D feature tracking while accelerometer readings are used to help create a sparse 3D map of features later to be used for visual pose estimation. Metric velocity estimation is formulated using inertial readings and observations of a single 2D feature. Detailed formulations of uncertainties on all the estimated variables are provided. Finally, a novel, lightweight filter, which is capable of estimating the pose of the camera together with the metric scale, is proposed. The proposed filter runs without any heuristics needed for covariance propagation, which enables it to be used in different devices with different sensor characteristics without any modifications. Sensors with poor noise characteristics are successfully utilized to aid the visual pose estimation.
@MASTERSTHESIS{yaksoymetu13,
author = {Aksoy, Ya\u{g}{\i}z},
title = {Efficient Inertially Aided Visual Odometry towards Mobile Augmented Reality},
school = {Middle East Technical University},
year = {2013},
month = {August}}

Published CMPT 461 / 769 projects


Samuel Antunes Miranda*, Shahrzad Mirzaei*, Mariam Bebawy*, Sebastian Dille, and Yağız Aksoy
SIGGRAPH Posters, 2024
Near-infrared imagery offers great possibilities for creative image editing. Lying outside the visual spectrum, the NIR information can effectively serve as a fourth color channel to common RGB. Compared to the latter, it shows interesting and complementary behavior: its intensity strongly varies with the surface materials in the scene and is less affected by atmospheric perturbations. For these reasons, NIR imaging has been a long-standing topic of interest in research and its integration has been proven successful for applications like false coloring, contrast enhancement, image dehazing, and purification of low-light images. Recent developments in smartphone technology have simplified the capturing process, making NIR data readily available for broader use outside the research community. At the same time, existing tools for NIR processing and manipulation are rare and still limited in functionality. With many solutions lacking specialized features, the editing process is inefficient and cumbersome, making them prone to generate suboptimal results. To tackle this issue, we introduce a simple and intuitive photo editing tool that combines RGB and NIR properties, offering functions tailored specifically for the RGB+NIR combination, and granting the user the ability to edit and refine images more creatively.
@INPROCEEDINGS{NIREditing,
author={Samuel Antunes Miranda and Shahrzad Mirzaei and Mariam Bebawy and Sebastian Dille and Ya\u{g}{\i}z Aksoy},
title={Interactive RGB+NIR Photo Editing},
booktitle={SIGGRAPH Posters},
year={2024},
}

Chris Careaga, Mahesh Kumar Krishna Reddy, and Yağız Aksoy
SIGGRAPH Asia Posters, 2023
We propose a simple method for emulating the effect of data moshing, without relying on the corruption of encoded video, and explore its use in different application scenarios. Like traditional data moshing, we apply motion information to mismatched visual data. Our approach uses off-the-shelf optical flow estimation to generate motion vectors for each pixel. Our core algorithm can be implemented in a handful of lines but unlocks multiple video editing effects. The use of accurate optical flow rather than compression data also creates a more natural transition without block artifacts. We hope our method provides artists and content creators with more creative freedom over the process of data moshing.
@INPROCEEDINGS{datamosh,
author={Chris Careaga and Mahesh Kumar Krishna Reddy and Ya\u{g}{\i}z Aksoy},
title={Datamoshing with Optical Flow},
booktitle={SIGGRAPH Asia Posters},
year={2023},
}

Gerardo Gandeaga, Denys Iliash, Chris Careaga, and Yağız Aksoy
SIGGRAPH Posters, 2022
This work introduces DynaPix, a Krita extension that automatically generates pixelated images and surface normals from an input image. DynaPix is a tool that aids pixel artists and game developers more efficiently develop 8-bit style games and bring them to life with dynamic lighting through normal maps that can be used in modern game engines such as Unity. The extension offers artists a degree of flexibility as well as allows for further refinements to generated artwork. Powered by out of the box solutions, DynaPix is a tool that seamlessly integrates in the artistic workflow.
@INPROCEEDINGS{dynapix,
author={Gerardo Gandeaga and Denys Iliash and Chris Careaga and Ya\u{g}{\i}z Aksoy},
title={Dyna{P}ix: Normal Map Pixelization for Dynamic Lighting},
booktitle={SIGGRAPH Posters},
year={2022},
}

Brigham Okano, Shao Yu Shen, Sebastian Dille, and Yağız Aksoy
SIGGRAPH Posters, 2022
Art assets for games can be time intensive to produce. Whether it is a full 3D world, or simpler 2D background, creating good looking assets takes time and skills that are not always readily available. Time can be saved by using repeating assets, but visible repetition hurts immersion. Procedural generation techniques can help make repetition less uniform, but do not remove it entirely. Both approaches leave noticeable levels of repetition in the image, and require significant time and skill investments to produce. Video game developers in hobby, game jam, or early prototyping situations may not have access to the required time and skill. We propose a framework to produce layered 2D backgrounds without the need for significant artist time or skill. In our pipeline, the user provides segmented photographic input, instead of creating traditional art, and receives game-ready assets. By utilizing photographs as input, we can achieve both a high level of realism for the resulting background texture as well as a shift from manual work away towards computational run-time which frees up developers for other work.
@INPROCEEDINGS{parallaxBG,
author={Brigham Okano and Shao Yu Shen and Sebastian Dille and Ya\u{g}{\i}z Aksoy},
title={Parallax Background Texture Generation},
booktitle={SIGGRAPH Posters},
year={2022},
}

Other conferences and workshops


Obumneme Stanley Dukor, S. Mahdi H. Miangoleh, Mahesh Kumar Krishna Reddy, Long Mai, and Yağız Aksoy
SIGGRAPH Posters, 2022
Recent advances in computer vision have made 3D structure-aware editing of still photographs a reality. Such computational photography applications use a depth map that is automatically generated by monocular depth estimation methods to represent the scene structure. In this work, we present a lightweight, web-based interactive depth editing and visualization tool that adapts low-level conventional image editing operations for geometric manipulation to enable artistic control in the 3D photography workflow. Our tool provides real-time feedback on the geometry through a 3D scene visualization to make the depth map editing process more intuitive for artists. Our web-based tool is open-source and platform-independent to support wider adoption of 3D photography techniques in everyday digital photography.
@INPROCEEDINGS{interactiveDepth,
author={Obumneme Stanley Dukor and S. Mahdi H. Miangoleh and Mahesh Kumar Krishna Reddy and Long Mai and Ya\u{g}{\i}z Aksoy},
title={Interactive Editing of Monocular Depth},
booktitle={SIGGRAPH Posters},
year={2022},
}

AR Museum: A Mobile Augmented Reality Application for Interactive Painting Recoloring
Mattia Ryffel, Fabio Zünd, Yağız Aksoy, Alessia Marra, Maurizio Nitti, Tunç Ozan Aydın and Bob Sumner
International Conference on Game and Entertainment Technologies, 2017
We present a mobile augmented reality application that allows its users to modify colors of paintings via simple touch interactions. Our method is intended for museums and art exhibitions and aims to provide an entertaining way for interacting with paintings in a non-intrusive manner. Plausible color edits are achieved by utilizing a set of layers with corresponding alpha channels, which needs to be generated for each individual painting in a pre-processing step. Manually performing such a layer decomposition is a tedious process and makes the entire system infeasible for most practical use cases. In this work, we propose the use of a fully automatic soft color segmentation algorithm for content generation for such an augmented reality application. This way, we significantly reduce the amount of manual labor needed for deploying our system and thus make our system feasible for real-world use.
@INPROCEEDINGS{armuseum,
author={Mattia Ryffel and Fabio Z\"und and Ya\u{g}{\i}z Aksoy and Alessia Marra and Maurizio Nitti and Tun\c{c} Ozan Ayd{\i}n and Bob Sumner},
title={AR Museum: A Mobile Augmented Reality Application for Interactive Painting Recoloring},
booktitle={International Conference on Game and Entertainment Technologies},
year={2017},
}

Florian Angehrn, Oliver Wang, Yağız Aksoy, Markus Gross and Aljosa Smolic
IEEE ICIP, 2014
Free viewpoint video enables interactive viewpoint selection in real world scenes, which is attractive for many applications such as sports visualization. Multi-camera registration is one of the difficult tasks in such systems. We introduce the concept of a static high resolution master camera for improved long-term multiview alignment. All broadcast cameras are aligned to a common reference. Our approach builds on frame-to-frame alignment, extended into a recursive long-term estimation process, which is shown to be accurate, robust and stable over long sequences.
@INPROCEEDINGS{anghernicip14,
author={Florian Angehrn and Oliver Wang and Ya\u{g}{\i}z Aksoy and Markus Gross and Aljo\v{s}a Smoli\'{c}},
title={MasterCam FVV: Robust Registration of Multiview Sports Video to a Static High-Resolution Master Camera for Free Viewpoint Video},
booktitle={IEEE International Conference on Image Processing (ICIP)},
year={2014}}

Yağız Aksoy and A. Aydın Alatan
IEEE ICIP, 2014
Most of the mobile applications require efficient and precise computation of the device pose, and almost every mobile device has inertial sensors already equipped together with a camera. This fact makes sensor fusion quite attractive for increasing efficiency during pose tracking. However, the state-of-the-art fusion algorithms have a major shortcoming: lack of well-defined uncertainty introduced to the system during the prediction stage of the fusion filters. Such a drawback results in determining covariances heuristically, and hence, requirement for data-dependent tuning to achieve high performance or even convergence of these filters. In this paper, we propose an inertially-aided visual odometry system that requires neither heuristics nor parameter tuning; computation of the required uncertainties on all the estimated variables are obtained after minimum number of assumptions. Moreover, the proposed system simultaneously estimates the metric scale of the pose computed from a monocular image stream. The experimental results indicate that the proposed scale estimation outperforms the state-of-the-art methods, whereas the pose estimation step yields quite acceptable results in real-time on resource constrained systems.
@INPROCEEDINGS{yaksoyicip14a,
author={Aksoy, Ya\u{g}{\i}z and Alatan, A. Ayd{\i}n},
title={Uncertainty Modeling for Efficient Visual Odometry via Inertial Sensors on Mobile Devices},
booktitle={IEEE International Conference on Image Processing (ICIP)},
year={2014}}

Impact of transrectal prostate needle biopsy on erectile function: Results of power Doppler ultrasonography of the prostate
Altug Tuncel, Ugur Toprak, Melih Balci, Ersin Koseoglu, Yağız Aksoy, Alp Karademir and Ali Atan
The Kaohsiung Journal of Medical Sciences, 2014
We evaluated the impact of transrectal prostate needle biopsy (TPNB) on erectile function and on the prostate and bilateral neurovascular bundles using power Doppler ultrasonography imaging of the prostate. The study consisted of 42 patients who had undergone TPNB. Erectile function was evaluated prior to the biopsy, and in the 3rd month after the biopsy using the first five-item version of the International Index of Erectile Function (IIEF-5). Prior to and 3 months after the biopsy, the resistivity index of the prostate parenchyma and both neurovascular bundles was measured. The mean age of the men was 64.2 (47–78) years. Prior to TPNB, 10 (23.8%) patients did not have erectile dysfunction (ED) and 32 (76.2%) patients had ED. The mean IIEF-5 score was 20.8 (range: 2–25) prior to the biopsies, and the mean IIEF-5 score was 17.4 (range: 5–25; p < 0.001) after 3 months. For patients who were previously potent in the pre-biopsy period, the ED rate was 40% (n = 4/10) at the 3rd month evaluation. In these patients, all the resistivity index values were significantly decreased. Our results showed that TPNB may lead to an increased risk of ED. The presence of ED in men after TPNB might have an organic basis.
@ARTICLE{Tuncel2014,
title = "Impact of transrectal prostate needle biopsy on erectile function: Results of power Doppler ultrasonography of the prostate ",
author = "Altug Tuncel and Ugur Toprak and Melih Balci and Ersin Koseoglu and Yagiz Aksoy and Alp Karademir and Ali Atan",
journal = "The Kaohsiung Journal of Medical Sciences ",
volume = "30",
number = "4",
pages = "194 - 199",
year = "2014",}

Yağız Aksoy and A. Aydın Alatan
ECCV Workshops, 2012
Shadows are illuminated as a result of Rayleigh scattering phenomenon, which happens to be more effective for small wavelengths of light. We propose utilization of false color images for shadow detection, since the transformation eliminates high frequency blue component and introduces low frequency near-infrared channel. Effectiveness of the approach is tested by using several shadow-variant texture and color-related cues proposed in the literature. Performances of these cues in regular and false color images are compared and analyzed within a supervised system by using a support vector machine classifier.
@INPROCEEDINGS{yaksoyeccvw12,
author={Aksoy, Yagiz and Alatan, A. Aydin},
title={Utilization of False Color Images in Shadow Detection},
booktitle={European Conference on Computer Vision (ECCV) Workshops},
year={2012}}

Yağız Aksoy, Ozan Şener, A. Aydın Alatan and Kemal Uğur
IEEE ICIP, 2012
We propose a complete still image based 2D-3D mobile conversion system for touch screen use. The system consists of interactive segmentation followed by 3D rendering. The interactive segmentation is conducted dynamically by color Gaussian mixture model updates and dynamic-iterative graph-cut. A coloring gesture is used to guide the way and entertain the user during the process. Output of the image segmentation is then fed to the 3D rendering stage of the system. For rendering stage, two novel improvements are proposed to handle holes resulting from depth image based rendering process. These improvements are also expected to enhance the 3D perception. These two methods are subjectively tested and their results are presented.
@INPROCEEDINGS{yaksoyicip12,
author={Aksoy, Yagiz and Sener, Ozan and Alatan, A. Aydin and Ugur, Kemal},
title={Interactive 2D-3D Image Conversion for Mobile Devices},
booktitle={IEEE International Conference on Image Processing (ICIP)},
year={2012}}

Patents


Intrinsic Image Decomposition via Ordinal Shading
Inventors: Yağız Aksoy, Chris Careaga
Assignee: Simon Fraser University
Provisional Patent Application, 2023

Intrinsic Harmonization for Illumination-Aware Compositing
Inventors: Yağız Aksoy, Chris Careaga
Assignee: Simon Fraser University
Provisional Patent Application, 2023

Inventors: Tunc Ozan Aydin, Ahmet Cengiz Öztireli, Jingwei Tang, Yağız Aksoy
Assignee: ETH Zurich, Disney Enterprises
US Patent US11615555B2, granted 2023

Inventors: Yağız Aksoy, Tunc Ozan Aydin
Assignee: ETH Zurich, Disney Enterprises
US Patent US10650524B2, granted 2020

Inventors: Yağız Aksoy, Tunc Ozan Aydin, Marc Pollefeys, Aljosa Smolic
Assignee: ETH Zurich, Disney Enterprises
US Patent US10037615B2, granted 2018