Aligning Non-Causal Factors for Transformer-based Source-Free Domain Adaptation

Sunandini Sanyal^*, Ashish Ramayee Asokan^*, Suvaansh Bhambri, Pradyumna YM, Akshay Kulkarni, Jogendra Nath Kundu, R. Venkatesh Babu

Vision and AI Lab, Indian Institute of Science
(* indicates equal contribution)

Paper arXiv Poster Code

→ Coming Soon!

A. Proposed method. We incorporate non-causal factors to learn domain-invariant representations using a subsidiary non-causal factor classification task. B. w/o Non-Causal Factor Alignment. Conventional domain-invariance methods aim to align only causal factors, leading to sub-optimal alignment between the source and target domain. C. w/ Non-Causal Factor Alignment. Non-causal factor alignment improves global alignment, leading to better task-discriminative causal factor alignment.

Abstract

Conventional domain adaptation algorithms aim to achieve better generalization by aligning only the task-discriminative causal factors between a source and target domain. However, we find that retaining the spurious correlation between causal and non-causal factors plays a vital role in bridging the domain gap and improving target adaptation. Therefore, we propose to build a framework that disentangles and supports causal factor alignment by aligning the non-causal factors first. We also investigate and find that the strong shape bias of vision transformers, coupled with its multi-head attention, make it a suitable architecture for realizing our proposed disentanglement. Hence, we propose to build a Causality-enforcing Source-Free Transformer framework (C-SFTrans) to achieve disentanglement via a novel two-stage alignment approach: a) non-causal factor alignment: non-causal factors are aligned using a style classification task which leads to an overall global alignment, b) task-discriminative causal factor alignment: causal factors are aligned via target adaptation. We are the first to investigate the role of vision transformers (ViTs) in a privacy-preserving source-free setting. Our approach achieves state-of-the-art results in several DA benchmarks.

Exploring Domain Invariance in ViTs

A. SOTA domain-invariance-based DA works, Feature-Mixup (ICML 2022) and DIPE (CVPR 2022), do not improve over the baseline for vision transformers despite large gains for CNNs. B. We observe that correlation $S$ and $Z$ in preserved after target adaptation. (Office-Home) C. With our proposed style task training, the overall source-target domain gap (pink) is lower indicating better domain-invariance. Further, we observe a lower domain gap considering only causal factors (yellow) indicating that non-causal alignment helps causal alignment. D. Causal Graph representing causal factors $S$ and non-causal factors $Z$, which are spuriously correlated via confounder $C$.

Proposed Method

We estimate the causal importance of each attention head in a block by training convex weights $\beta_1$ and $\beta_2$ that combine the output of the attention heads for normal input $x$ and style characterizing input $x_{SCI}$. A higher $\beta_2$ indicates that the attention head gave more importance to non-causal style factors and can be chosen as a non-causal head.
The selected non-causal heads $h_n$ are updated to train the style classifier $f_n$ with the style classification loss $\mathcal{L}_{style}$ via a style token $z_n$. The task classifier is trained by updating only causal heads $\mathcal{H}\!\setminus\!h_n$ with the task classification loss $\mathcal{L}_{cls}$ via the class token $z_c$. The two steps are executed alternately.

Main Results

Table 1. Single-Source Domain Adaptation (SSDA) on Office-Home benchmark. SF denotes source-free adaptation. ResNet-based methods (top) and Transformer-based methods (bottom). * indicates results taken from CDTrans (ICML 2021).

Table 2. Single-Source Domain Adaptation (SSDA) on the VisDA benchmark. SF denotes source-free adaptation. ResNet-based methods (top) and Transformer-based methods (bottom). * indicates results taken from CDTrans (ICML 2021).

Table 3. Single-Source Domain Adaptation (SSDA) on the DomainNet benchmark. SF denotes source-free adaptation. * indicates results taken from SSRT (CVPR 2022).