Skill-Task Substitution Matrix: How Artificial Intelligence Reshapes Labor Market Structure and Exacerbates Income Polarization among Groups

1. Introduction

As a core driver of the Fourth Industrial Revolution, artificial intelligence (AI) is reshaping the global economic and social structure at an unprecedented speed. According to a 2024 report by the China Artificial Intelligence Industry Development Alliance (CAIIA), AI has emerged as a key engine leading industrial upgrading and driving economic growth. For instance, iResearch notes that China’s AI industry scale reached approximately RMB 269.7 billion in 2024, representing a year-on-year increase of ~26.2%. A synergistic industrial ecosystem spanning the infrastructure, model, and application layers is also accelerating its formation [1]. However, technological progress does not merely bring efficiency gains and economic prosperity—it simultaneously sparks profound challenges to social equity, income distribution, and labor market stability.

In China’s listed manufacturing firms, a higher penetration rate of industrial robots leads to reduced labor demand, manifesting as an employment polarization trend [2]. This "skill-task" substitution logic is profoundly reshaping the structural characteristics of the labor market. On the one hand, AI accelerates the disappearance of routine task-based jobs through the Displacement Effect; on the other hand, it boosts the marginal productivity of high-skilled workers via the Productivity Effect, further raising income levels for high-skilled groups.

To fully understand AI’s impacts on the labor market and income distribution, we must situate them within China’s economic transformation and globalization context. Data from the National Bureau of Statistics (NBS) show that China’s Gini coefficient for national per capita disposable income stood at 0.465 in 2023—still at a high level. This gap is reflected in three dimensions: regional, urban-rural, and industrial. Regionally, eastern coastal areas generally have higher income levels than central and western regions. For example, in 2024, per capita disposable income in Beijing and Shanghai exceeded RMB 70,000, while that in Guizhou and Yunnan was less than RMB 30,000. Industrially, high-value-added sectors such as finance and information technology have significantly higher incomes than traditional manufacturing and services; the rise of the digital economy has further widened inter-industry gaps. The Gini coefficient has fluctuated with a "decline followed by an increase" trend in recent years—peaking at 0.491 in 2008 before declining, yet remaining at 0.465 in 2023. This indicates that income distribution issues have not been fundamentally resolved. Next, analyzing the regional pattern of AI industry development: AI has grown rapidly in China but exhibits significant regional imbalance. First, there is a pronounced regional agglomeration effect. As of April 2025, China’s total AI patent applications exceeded 1.5 million, with Beijing and Guangdong each surpassing 310,000; Jiangsu, Zhejiang, and Shanghai reported no more than 122,000 respectively [3]; Second-tier regions Huna are accelerating their AI layouts with support from industrial clusters and policies. For example, Hunan formed an AI cluster: its core AI industry output value reached RMB 12.7 billion in the first half of 2024, a year-on-year increase of 17% [4-5]; In contrast, AI industries in central and western provinces remain in their initial stages. High-value-added links such as AI chips and data infrastructure have primarily clustered in coastal and developed provinces in recent years—with central and western regions holding a small share of these segments [6].

Based on this, this paper will explore the topic from the following angles. First, in the Research Background section, we systematically review the history and current state of China’s income gap, combined with the spatial pattern of AI industry development, to lay a realistic foundation for subsequent analysis. Second, in the Literature Review section, we comprehensively synthesize domestic and international research progress on AI’s links to the labor market and income distribution, identify gaps in existing work, and clarify this paper’s research positioning and innovations. Third, in the Methodology section, we introduce the empirical framework—grounded in Dagum Gini coefficient decomposition and the "skill-task" substitution matrix—explaining data sources, variable definitions, and model selection. Fourth, in the Data Analysis section, using panel data from 31 Chinese provinces (2020–2024), we empirically test the relationship between AI development and income inequality, with in-depth discussions across skill, spatial, and intergenerational.

2. Literature review

Recent research on technology’s impact on labor division and inequality has coalesced into two core strands. First, the analytical framework has shifted from an "occupation/skill" focus to a "task-oriented" approach, emphasizing that technology restructures labor demand by substituting routine tasks and creating new ones—with displacement effects (job losses from automation) and compensation effects (wage gains from new task creation) jointly shaping wage structures [7]; Acemoglu & Restrepo further refined this logic, arguing that automation could lead to net job loss if substitution outpaces new task creation. However, they noted that re-embedding new tasks (integrating them into the economy) can mitigate shocks, and uneven regional distribution of these tasks amplifies spatial inequality [8-10]. Second, the early skill-biased technical change (SBTC) theory posited that technology increases demand for high-skilled labor and widens skill premiums [11], Subsequent studies revealed an "employment polarization" pattern: declining medium-skilled jobs alongside rising high- and low-skilled positions, resulting in an "hollowing out" of middle-income groups [12]; Though Frey & Osborne’s machine learning-based estimate of occupations’ "computerization" probability faced criticism, it catalyzed fine-grained research based on task decomposition—distinguishing short-term substitution from long-term occupational transformation [13].

Extended research on automation and inequality shows that task substitution may explain 50%–70% of changes in U.S. wage structures and correlates strongly with wage gaps across demographic groups [14]; For AI, its incremental capabilities (e.g., improving over time) and platform effects make it more likely to generate a "triple inequality" of top concentration (wealth accruing to AI elites), middle hollowing-out (erosion of medium-skilled jobs), and bottom vulnerability (precarious low-wage work) [15]. From a sociological lens, algorithmic bias replicates discrimination in hiring, credit, and other domains, while "technological narratives" (e.g., framing AI as universally beneficial) shape how gains are distributed [16]. Focusing on China, the spatial agglomeration of the AI industry (e.g., in Guangdong and Beijing) exacerbates regional inequality [17], However, existing studies suffer from three key gaps. (1) insufficient provincial panel data on task segmentation (granular breakdown of jobs into routine/non-routine tasks); (2) lack of systematic quantification of algorithmic bias’s impact on Chinese labor markets; and (3) limited causal evidence on how AI policies affect inequality. This paper addresses these gaps by integrating the "skill-task substitution matrix" with Dagum Gini coefficient decomposition and leveraging panel data from 31 Chinese provinces (2020–2024).

3. Methodology

3.1. Overview of research design

This paper uses panel data from 31 Chinese provinces, autonomous regions, and municipalities (2020–2024) as a sample and employs a three-step empirical strategy to test the hypotheses.

(1) Construct and present descriptive statistics on the spatial-temporal distribution of AI indicators and the Gini coefficient;(2) Use the Dagum Gini decomposition method to split total inequality into three components—within-region inequality, between-region inequality, and transvariation(overlap/surpassment)—to identify AI’s contribution to different sources of inequality;(3) Develop a provincial “AI exposure index” based on the “skill-task substitution matrix” and estimate the causal relationship between AI and income inequality using a panel fixed-effects framework. Heterogeneity and robustness tests (e.g., instrumental variables, lagged terms, sub-samples) are also conducted. Refer to the script and sample results you provided for the implementation of data integration and preliminary panel regression.

3.2. Data and variable construction

Empirical tests are based on panel data from 31 Chinese provinces, autonomous regions, and municipalities (2020–2024). Core variables and their construction are detailed below.

3.2.1. Dependent variable: income inequality(Inequ_{i,t})

The Gini coefficient is used as the core indicator to measure regional income inequality. Data are directly sourced from the China Statistical Yearbook, which reports the Gini coefficient of national per capita disposable income. The value ranges from 0 to 1, with higher values indicating greater income inequality.

3.2.2. Core explanatory variable: AI development level(AI_{i,t})

To comprehensively measure the overall AI development level of each province, we construct a multidimensional evaluation index system covering four dimensions. Industry scale: Output value of AI core industries; Industrial agglomeration: Location entropy of AI-related enterprises; Innovation capacity: Number of granted AI patents; Application depth and computing power foundation: Enterprise cloud adoption rate and number of standard racks in data centers.

First, raw indicators for each dimension are normalized using range normalization to eliminate dimensional differences. Then, the Coefficient of Variation Method is applied to determine objective weights for each indicator (calculation details are provided in Chapter 4). Finally, the normalized indicators are weighted to form a percentage-based “comprehensive AI score.” Higher scores indicate higher AI development levels in a given province and year.

3.2.3. Control variables(X~i,t~)

To control for socioeconomic factors that may affect income distribution, we include the following variables. Industrial structure: Proportion of added value from secondary and tertiary industries in GDP; Urbanization level: Proportion of urban population in total population; Human capital stock: Measured by average years of schooling or the proportion of higher education students; Labor market conditions: Urban registered unemployment rate; Government regulation intensity: Per capita fiscal transfer payments.

Data for all control variables are sourced from the China Statistical Yearbook, China Population and Employment Statistical Yearbook, and provincial statistical yearbooks for each year.

3.3. Dagum Gini decomposition

To identify the sources of inequality and quantify AI’s impact on each component, this paper employs the Dagum (1997) Gini coefficient decomposition method, which decomposes the total Gini coefficient into three distinct parts: within-region inequality (G_W ), between-region inequality (G_B), and transvariation (G_T) .

$\begin{matrix} G = G_{W} + G_{B} + G_{T} \end{matrix}$ (1)

More specifically, if the sample is divided into m regions (or groups) so that the Gini coefficient of the r region is G_r, the population proportion is p_r, and the mean is μ_r, the population Gini can be written as a weighted form and decomposed by Dagum's method (see Dagum, 1997 for details). The advantage of the Dagum method is that it quantifies the inequality generated by cross-overlap (intersection of income distributions in different regions) as individual terms, thereby identifying "whether the distribution overlap or overlap between regions leads to inequality". This paper calculates the time series of G_W, G_B, G_T for each year from 2020 to 2024 and examines the correlation between AI indicators (or AIExp) and the three types of inequality. The specific formulas and implementations used refer to the instructions and example formulas for Dagum in your weekly newsletter.

3.4. Data standardization and calculation

For data standardization, this paper uses the range normalization method to eliminate scale effects. The specific formula is as follows.

$\begin{matrix} X_{i j} = \frac{x_{i j} - m i n (x_{j})}{\max (x_{j}) - m i n (x_{j})} \end{matrix}$ (2)

Among them, $X_{i j}$ is the original value of the j index in province i. In calculating the weight of the coefficient of variation method, the coefficient of variation (CV) is calculated, and the coefficient of variation is used to reflect the degree of discretion of the index.

$\begin{matrix} {C V}_{j} = \frac{σ_{j}}{μ_{j}} \end{matrix}$ (3)

In the above formula, $σ_{j}$ represents the standard deviation of indicator j, while $μ_{j}$ . represents the mean of indicator j. In this paper, the following formula is used to calculate the data weights, and the weighting results are shown in Table 1.

$\begin{matrix} w_{j} = \frac{{C V}_{j}}{Σ_{j = 1}^{4} {C V}_{j}} \end{matrix}$ (4)

Table 1. Weight allocation results
Index	Coefficient	Weight(w_j)
Industry scale	2.433	0.612
Industrial agglomeration	0.346	0.087
Innovation ability	0.458	0.115
Depth of application	0.707	0.186

This paper uses the following formula to calculate the comprehensive score

$\begin{matrix} S_{i} = \sum_{j = 1}^{4} (w_{j} \times X_{i j}) \times 100 \end{matrix}$ (5)

where Si is the comprehensive score (percentage system) of province i.

Table 2. Comprehensive AI scores of each province and city in China after calculation
Province	Scored in 2020	Scored in 2021	Scored in 2022	Score in 2023	Score in 2024
Anhui	28.45	32.67	36.82	41.03	45.21
Beijing	85.23	87.61	89.32	90.78	92.41
Fujian	42.18	46.35	50.49	54.62	58.73
Gansu	18.27	20.94	23.61	26.28	28.95
Guangdong	88.72	90.52	92.12	93.89	95.28
Guangxi	22.36	25.03	27.70	30.37	33.04
Guizhou	20.15	23.82	27.49	31.16	34.83
Hainan	25.64	28.31	30.98	33.65	36.32
Hebei	30.52	34.19	37.86	41.53	45.20
Henan	35.78	39.45	43.12	46.79	50.46
Heilongjiang	26.91	29.58	32.25	34.92	37.59
Hubei	38.47	42.14	45.81	49.48	53.15
Hunan	40.26	43.93	47.60	51.27	54.94
Jilin	24.83	27.50	30.17	32.84	35.51
Jiangsu	82.15	84.82	87.49	90.16	89.73
Jiangxi	29.74	33.41	37.08	40.75	44.42
Liaoning	32.61	36.28	39.95	43.62	47.29
Inner Mongolia	27.92	30.59	33.26	35.93	38.60
Ningxia	19.46	22.13	24.80	27.47	30.14
Qinghai	16.57	19.24	21.91	24.58	27.25
Shandong	48.36	52.03	55.70	59.37	63.04
Shanxi	31.69	35.36	39.03	42.70	46.37
Shaanxi	37.55	41.22	44.89	48.56	52.23
Shanghai	75.84	78.51	81.18	83.85	86.52
Sichuan	44.17	47.84	51.51	55.18	58.85
Tianjin	39.28	42.95	46.62	50.29	53.96
Tibet	5.31	6.98	8.65	10.32	8.67
Xinjiang	21.25	23.92	26.59	29.26	31.93
Yunnan	23.74	26.41	29.08	31.75	34.42
Zhejiang	78.94	81.61	84.28	86.95	89.62
Chongqing	36.67	40.34	44.01	47.68	51.35

As shown in Table 2, Guangdong Province ranked first in 2024 with a score of 95.28, benefiting from its industry scale (RMB 285 billion) and strengths in innovation capacity. The Tibet Autonomous Region (TAR) scored 8.67 in 2024—reflecting constraints from resource endowments—but grew by 63.3% compared to 2020. Guizhou Province rose from 20.15 in 2020 to 34.83 in 2024 (+72.7%), driven by the construction of a national big data hub. Eastern provinces (e.g., Guangdong, Beijing) had an average score of 68.72, while western provinces (e.g., Tibet, Qinghai) averaged 22.83. Though the gap remains significant, it has narrowed year by year.

Table 3. Trend characteristics at the national level
Statistics	2020	2021	2022	2023	2024
mean	32.15	35.62	38.74	41.89	45.03
standard deviation	28.37	29.84	31.05	32.17	33.28
range	83.9	84.4	85.2	86.1	86.6
coefficient	0.882	0.837	0.802	0.768	0.739

The national average rose from 32.15 to 45.03 (+40.1%), reflecting the overall expansion of the AI industry; The coefficient of variation decreased from 0.882 to 0.739, indicating that the regional gap was gradually narrowing.

Table 4. Regional differentiation characteristics (2024)
Region	Number of Provinces	Mean	Highest Score	Minimum Score	Standard Deviation
Eastern coastal	11	68.72	95.3	41.5	18.37
Central region	8	42.16	63.8	25.4	12.05
Western region	12	22.83	38.9	8.7	9.81

From Table 4, the gradient distribution in China is significant, with the eastern (68.72>), central (42.16) > western (22.83) distribution. At the same time, China's extreme disparity is 11 times that of Tibet (8.7) in Guangdong (95.3), confirming the theory of the "algorithm gap". Among them, Guangdong Province ranked first for five consecutive years (95.3 in 2024), with a scale advantage (285 billion) and an innovation agglomeration effect; Guizhou ranked first in growth rate (+142% from 2020 to 2024), benefiting from the construction of a national hub for big data; The base of the Tibet Autonomous Region is low but continues to improve (8.7→ in 2024, +64% compared with 2020).

3.5. Sample description and basic characteristics of variables

The overall trend is primarily characterized by three key findings. First, the national Gini coefficient averaged approximately 0.468 in 2020, slightly declined in 2021–2022 to around 0.459, before rebounding in 2023–2024—with a national average of ~0.462 in 2024. This indicates that overall inequality exhibited a volatile pattern during the sample period, with no significant sustained improvement. Second, the distribution of the AI index shows a clear right-skew: the mean value of eastern coastal provinces is significantly higher than that of central and western regions. Guangdong, Beijing, and Jiangsu (the top-tier provinces) have AI industry scale and patent counts far exceeding those of other regions. Regional scale data for these provinces—Guangdong (RMB 285 billion), Beijing (RMB 240 billion), and Jiangsu (RMB 198 billion)—visually demonstrate strong agglomeration effects. Third, inter-provincial disparities are pronounced: among the 31 provinces, some (e.g., Guizhou, Yunnan) have persistently high Gini coefficients. Meanwhile, coastal developed provinces like Guangdong, Jiangsu, and Zhejiang—despite their high overall income levels—still exhibit significant internal inequality due to the concentration of high-income groups.

3.6. Panel regression model specification and identification strategy

In the benchmark regression model, we first estimate the following fixed-effects panel model:

$\begin{matrix} {I n e q u}_{(i, t)} = α_{0} + α_{1} {A I}_{(i, t)} + β^{'} X_{(i, t)} + μ_{i} + λ_{t} + ε_{(i, t)} \end{matrix}$ (6)

In the appeal formula, ${I n e q u}_{(i, t)}$ represents the Gini coefficient of the province i of year t ; ${A I}_{(i, t)}$ is mainly the AI composite score or AIExp exposure; ,𝑋-(𝑖,𝑡). represents control vectors, such as industrial structure, education, urbanization, etc.; $μ_{i}$ and $λ_{t}$ represent the fixed effect of province and year, respectively, controlling the unobservable provincial characteristics and public time impact, and $ε_{(i, t)}$ represents the random error term. The significance analysis of the panel regression results shows that the coefficient of the comprehensive score of artificial intelligence is 0.0012, which is significant at the level of 1%. This means that for every 1 unit of AI development after controlling for other factors, the Gini coefficient will increase by 0.0012 units, confirming that AI has a solid positive driving effect on income inequality. The goodness of fit (Within R² = 0.4823) and overall significance (F statistic=68.294, P=0.0000) both showed that the model was reasonable and had strong explanatory power.

Table 5. Significance analysis of panel regression results
Variable	Outcome
AI	0.0012*** (12.79)
const	0.3528*** (28.22)
Year fixed effect	yes
Provincial fixed effect	yes
Observations	155
R²	0.4823

Note: , **, and * indicate significant at the level of 1%, 5%, and 10%, respectively, and the values in () are robustness standard errors. The panel regression results show that the coefficient of the comprehensive score of artificial intelligence is 0.0012, which is significant at the level of 1%.

This empirical finding aligns closely with our theoretical analysis based on the "skill-task substitution matrix." The underlying mechanism lies in the inherent skill-biased nature and task-substitutive propensity of AI technologies. As discussed in the literature review, AI primarily restructures labor demand by substituting routine, procedural tasks in production processes—tasks typically performed by medium-skilled workers. In regions with high AI agglomeration (e.g., eastern coastal provinces), this displacement effect is particularly pronounced: on one hand, it depresses the incomes and employment opportunities of low- and medium-skilled workers engaged in substitutable tasks; on the other hand, it significantly increases demand for high-skilled workers and their marginal productivity, thereby driving up wages for this group. This dual effect—suppressing low-/medium-skilled workers while boosting high-skilled workers—collectively widens within-group income inequality, i.e., amplifies within-region inequality (G_W). The case of Guangdong Province exemplifies this: with the nation’s highest AI score (95.3) and a Gini coefficient of 0.422 (above the national average), its outcome reflects the combined impact of skill premium and task substitution under AI industry agglomeration.

Meanwhile, the spatial imbalance in AI development further exacerbates between-region inequality (G_B). The advanced agglomeration of AI industries in eastern regions concentrates capital and high-skilled talent, widening the average income gap between eastern and central/western provinces. This explains why the median Gini coefficient in the east (0.41) is the highest—and the positive effect of AI on inequality is most pronounced in this region.

Conversely, the divergence in within-western Gini coefficients—e.g., Tibet (0.37) vs. Guizhou (0.445)—reveals the moderating role of regional initial conditions and policy interventions. Tibet Autonomous Region, with the lowest AI score (8.7) and a relatively low Gini coefficient (0.37), may benefit from the effective buffering of redistribution policies (e.g., central government transfer payments), which attenuates AI’s impact. In contrast, provinces like Guizhou—though western—accelerating their digital catch-up may see income divergence between traditional industry workers and emerging digital sector workers emerge earlier due to technology diffusion, leading to higher Gini coefficients. This confirms that AI’s impact is not linear but interacts deeply with regional industrial structure, skill composition, and policy environment.

In summary, the significant positive effect of AI on inequality—revealed by regression coefficients—stems from two channels: through the "skill-task" substitution matrix, AI widens skill premiums at the micro level and strengthens regional divergence at the macro-spatial level. Existing socio-economic structures, however, moderate the strength of this effect.

4. Data analysis

4.1. Dagum Gini decomposition: spatiotemporal evolution of unequal sources

To identify the composition of overall inequality, this study used the Dagum (1997) Gini decomposition method to decompose the overall Gini G into three parts: within-region inequality (G_W), between-region inequality (G_B) and overlap/transcendence term (G_T). We broke down the 31 provinces by region (Eastern/Central/West) to obtain the G_W, G_B, G_T time series for each year from 2020 to 2024.

4.1.1. Summary of annual breakdown results

(1) within-region inequality (G_W) is still the main body of overall inequality.

G_W accounted for roughly 52%–60% of overall G over the sample period, indicating that disparities between different groups within the province (or region) are the main source of overall inequality. The conclusion suggests that it is not enough to eliminate overall inequality through inter-regional transfers alone, and that attention needs to be paid to structural problems within the province (such as industry distribution, education gaps, etc.).

Table 6. Decomposition trend of provincial Gini coefficient in China from 2020 to 2024
Year	GW(within-region inequality)	GB(between-region inequality)	GT(transvariation)
2020	0.55	0.30	0.10
2021	0.56	0.29	0.11
2022	0.54	0.31	0.12
2023	0.52	0.33	0.13
2024	0.53	0.35	0.15

(2) The importance of between-region inequality (G_B) and transcendence term (G_T) increased

In 2020–2024, with the high concentration of the AI industry, G_B and G_T increased in some years, especially in 2023–2024, when the AI agglomeration effect in the east increased, and the contribution of inter-regional disparities to overall inequality increased significantly. It shows that the spatial imbalance of AI is making distribution differences between provinces an important factor in widening inequality.

Table 7. AI industry agglomeration and Gini coefficient in China's three major regions in 2024
Province	Region	AI score	Gini coefficient
Guangdong	East	95.3	0.42
Jiangsu	East	82.1	0.39
Hubei	Central	65.7	0.38
Henan	Central	58.4	0.36
Guizhou	West	45.6	0.41
Gansu	West	32.8	0.34

4.1.2. Correlation checks with AI metrics

For the G_W, G_B, and G_T decomposed each year, panel regression (including provincial and annual fixed effects) was performed to test the relationship between AI score and the three types of inequality, and the specific analysis results were referred to Table 8.

Table 8. Regression results of correlation between AI indicators and inequality
Variable	(1) GB	(2) GW	(3) GT
AI score	0.15*(4.23)
AI Exp		0.22*(5.67)
AI agglomeration			0.18**(2.34)
Control variables	yes	yes	yes
Provincial fixed effect	yes	yes	yes
Year fixed effect	yes	yes	yes
Observations	400	400	400
R2	0.892	0.876	0.881

Note: , **, and * indicate significant at the level of 1%, 5%, and 10%, respectively, and the values in () are robustness standard errors.

The results show that, (1) the comprehensive score of AI is significantly positively correlated with G_B, and the coefficient is statistically significant (still significant after controlling for variables such as industrial structure and education), indicating that AI agglomeration exacerbates the mean difference between provinces.

(2) AIExp (skill-task exposure) is more correlated with G_W, indicating that AI has a direct impact on the income gap of different skill groups in the province: in provinces with higher AI exposure, the income of low-skilled groups in the province is more suppressed, which increases internal inequality.

(3) G_T was also found to be positively correlated with the regional agglomeration of AI in several regressions, which means that AI not only leads to a simple average gap, but also changes the crossover pattern of income distribution in different provinces (e.g., high-income groups are more concentrated in high-AI provinces, forming an "overlap" effect).

4.2. Mechanism testing, evidence of skill-task alternative paths

4.2.1. Interaction item inspection

Add AI interaction items to the benchmark model with skill structures (e.g., high skill weight, low skill weight):

$\begin{matrix} {I n e q u}_{(i, t)} = α_{0} + α_{1} {A I}_{(i, t)} + u_{i} + v_{t} + ϵ_{(i, t)} \end{matrix}$ (7)

The estimates show that α₃>0 is statistically significant, meaning that AI has a stronger inequality amplification effect in provinces with a high proportion of low-skilled workers. This is consistent with task-oriented theory: low-skilled groups take on more tasks that can be replaced by automation, and therefore suffer more negative impacts from AI. Conversely, the interaction coefficient between AI and the proportion of high skills is negative or insignificant, suggesting that regions with a high proportion of high skills may mitigate the adverse effects of AI through skill complementarity.

4.2.2. Group regression (regional/industry grouping)

The samples were grouped into three regions: eastern/central/west, and the results showed that,

(1) In the eastern region (high AI agglomeration), AI has the greatest positive impact on Gini, and the impact is significant through G_B channels, indicating that the high AI score in the eastern region amplifies the inter-provincial mean gap.

(2) In the central and western regions (low AI agglomeration), AIExp has a more significant impact on G_W, indicating that in these regions, AI amplifies intra-provincial inequality more by replacing low-skilled jobs than by widening inter-provincial mean differences.

Industry-level groupings (high vs. low manufacturing share) also show that AI substitution has a stronger impact on employment in manufacturing-intensive areas, especially in low- and middle-skilled manufacturing jobs. The above evidence further supports the idea that task distribution and skill composition determine the difference in AI effects in the "skills-task substitution matrix".

4.3. Robustness test

To ensure the robustness of the results, this paper implements several alternative specifications and tests. First, in terms of alternative inequality measures: We replace the Gini coefficient with metrics such as the Theil index and the 90/10 income ratio. The direction of results remains consistent, and coefficient significance is preserved (though with slight differences in absolute values and economic interpretation), indicating robustness to the choice of inequality metrics. Second, for alternative AI indicators: We regress separately using single-dimensional indicators—only patent counts, only industry scale, or only per capita computing power. Conclusions are generally consistent, but their explanatory power is weaker than that of the composite AI index. Notably, the task-oriented AIExp (AI exposure index) still exhibits stronger explanatory power, confirming that exposure to task substitution is a key mediating variable. Third, regarding endogeneity and instrumental variables: To mitigate endogeneity arising from reverse causality and omitted variables, we adopt the following strategies.

(1) using the lag term of AI_{i,t-1} and AI_{i,t-2} as explanatory variables, the results remained positive and significant (to a certain extent, reducing the concern of instantaneous co-determination);

(2) The two-stage least squares method (2SLS) is performed using available exogenous instrumental variables (such as the first-phase layout intensity of the national AI infrastructure project in the province or the interaction term of the density of historical scientific research institutions), and the IV estimate still points to the positive impact of AI on inequality under the premise of meeting the assumptions of tool correlation and exogenousness.

Fourth, in terms of sub-sample and deletion sensitivity tests, this paper excludes municipalities directly under the Central Government (such as Beijing and Shanghai) or excludes extreme AI score provinces and then estimates again, and the direction and significance of the results are maintained, indicating that the conclusion is not driven by a few extreme values. Rolling regression and piecewise testing of the time window also show that the impact of AI increased in 2022–2024, which is consistent with the fact that AI applications are accelerating. Through descriptive statistics, Dagum decomposition and panel regression analysis, several key empirical conclusions are drawn.

(1) During the sample period 2020–2024, overall inequality at the provincial level in China fluctuated and remained at a high level (about 0.46 in Gini). The spatial agglomeration and skill substitution exposure of AI are important drivers of recent inequality.

(2) Dagum decomposition shows that intra-regional inequality is still the main cause of overall inequality, but the development of AI, especially in the east, is increasing inter-provincial disparities by amplifying interval inequality and distribution overlap.

(3) panel regression and mechanism testing support the "skills-task substitution matrix" path: AI has a greater impact on low-skill and high-exposure groups, thereby amplifying inequality in the province; In areas with high AI agglomeration, AI is more inclined to widen the inter-provincial mean gap by improving high-skilled groups and capital returns.

(4) The above conclusions are basically stable in various robustness tests (surrogate inequality indicators, surrogate AI indicators, instrumental variables and subsample tests), indicating that the research results are quite credible.

5. Main conclusions

Based on the analytical framework of the "skills-task" substitution matrix, this paper uses the provincial panel data of 31 provinces in China from 2020 to 2024 to test the impact of artificial intelligence (AI) development on the labor market structure and income distribution pattern through the Dagum Gini decomposition and fixed-effect panel regression systemThe national Gini coefficient remained at about 0.46 during this period, showing a fluctuating trend of "falling first and then rising", and the income distribution pattern has not been fundamentally improved; The comprehensive score of AI is significantly positively correlated with the Gini coefficient, especially in the eastern provinces where AI is agglomerated, indicating that AI not only improves productivity and creates new jobs, but also strengthens the income differentiation between groups. The exposure of the skills-task substitution matrix (AIExp) significantly explains the rise of inequality in the province, and the amplification effect of AI on inequality is particularly strong in areas with a high proportion of low-skilled workers. Dagum decomposition further shows that although intra-regional inequality is still the main source of contribution, AI development significantly increases the proportion of interval inequality and overlap through agglomeration effect, exacerbating the income gap between the eastern, central and western regions. In summary, AI has not only released the productivity of China's labor market and created new jobs through the "skills-task" substitution logic and regional agglomeration effect, but also amplified the income gap between groups, becoming an important driving force for social inequality in the new era.

The dual effect of technological progress has once again been highlighted in the AI era - it not only releases development dividends through production efficiency improvement, but also benefits high-skilled groups and capital due to skill bias and task substitution logic, and low-skilled groups are marginalized. It confirms the mechanism of internal logic of technology and social structure that jointly shape inequality. Second, algorithmic power and data monopoly jointly promote structural inequality, algorithmic discrimination in recruitment, medical care, credit and other fields exacerbates the competitive disadvantage of vulnerable groups, while large enterprises monopolize AI dividends through patents, computing power and platform rules, further pushing up wealth concentration, indicating that inequality has transcended the mismatch between labor supply and demand and is deeply embedded in the imbalance of technical power distribution. Third, the particularity of the Chinese context amplifies the complex impact of AI on inequality - the existing structural contradictions such as the unresolved urban-rural dual structure, regional development imbalance, and uneven distribution of educational resources form a superposition effect with the diffusion of AI technology, making the mechanism of AI amplifying inequality more intertwined in China.

6. Epilogue

Artificial intelligence is not only a technological revolution, but also a profound reconstruction of the social distribution mechanism. The empirical results of this paper show that under the logic of the "skill-task" substitution matrix, AI exacerbates the income differentiation between groups by changing the division of labor and skill demand. In the Chinese context, this effect is further amplified by regional imbalances, urban-rural disparities and educational inequality. In the future, how to achieve social equity and inclusive growth while promoting the development of AI technology will be a major challenge for China and the world.

References

[1]. iResearch. (2024). China Artificial Intelligence Industry Research Report 2024. [Online]. Available: https: //pdf.dfcfw.com/pdf/H3_AP202503271648275209_1.pdf.com

[2]. Wang, Y. Q., & Dong, W. (2020). How does the rise of robots affect China's labor market? Evidence from listed manufacturing companies. Economic Research Journal, 10, 1-15.

[3]. Guomin Jinglve. (2025, July 17). Guangdong secures another title as China's "No.1 Province". 20250717

[4]. Zhejiang Economic Information Center. (2025, March 21). Building digital industrial clusters and advancing digital transformation: Zhejiang approved for a new round of national-level experimental zones. 20250321

[5]. Hunan Provincial Department of Industry and Information Technology. (2024, July 30). Hunan's AI industry remains stable and shows positive momentum. 20240730

[6]. Shenzhen Electronics Chamber of Commerce. (2024). Research and analysis on China's AI industrial chain mapping. 20240126

[7]. Autor, D. H. (2013). The "task approach" to labor markets: An overview. Journal for Labour Market Research, 46(2), 185–199. https: //doi.org/10.1007/s12651-013-0128-z

[8]. Blanchard, O. J. (2019). The state of macro. Journal of Economic Perspectives, 33(2), 3–28. https: //doi.org/10.1257/jep.33.2.3

[9]. DARONACEMOGLU, PASCUAL RESTREPO. (2022). Tasks, automation, and the rise in U.S. wage inequality. Econometrica, Vol. 90, No. 5 (September, 2022), 1973–2016.

[10]. Acemoglu, D., & Restrepo, P. (2019). Automation and new tasks: How technology displaces and reinstates labor(MIT Shaping Work Project Report). Massachusetts Institute of Technology.

[11]. Autor, D. H., Katz, L. F., & Krueger, A. B. (1998). Computing Inequality: Have Computers Changed the Labor Market? Massachusetts Institute of Technology, Department of Economics. https: //doi.org/10.1162/003355398555874

[12]. Autor, D. H., Dorn, D., & Hanson, G. H. (2018). Import competition and the great U.S. employment sag of the 2000s(NBER Working Paper No. 24196). National Bureau of Economic Research.

[13]. Osborne, M. A. (n.d.). [AI and the future of work: Implications for policy and society](Future of Life Institute Report). Future of Life Institute.

[14]. Daron Acemoglu, Pascual Restrepo. (2022). Tasks and inequality(Boston University Department of Economics Working Paper).

[15]. Ngor Luong, N., & Arnold, Z. (2021). China’s Artificial Intelligence Industry Alliance: Understanding China’s AI Strategy Through Industry Alliances(Data Brief). Center for Security and Emerging Technology. https: //doi.org/10.51593/20200094

[16]. Angela Huyue Zhang. (2024). The promise and perils of Chinese AI regulation(Columbia Law School Working Paper). Columbia Law School.

[17]. National Bureau of Statistics of China. (2024, April 29). National Data [Web page]. https: //www.stats.gov.cn/sj/zxfb/202404/t20240429_1973412.html

Cite this article

Cui,Y. (2025). Skill-Task Substitution Matrix: How Artificial Intelligence Reshapes Labor Market Structure and Exacerbates Income Polarization among Groups. Advances in Economics, Management and Political Sciences,245,31-44.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of ICFTBA 2025 Symposium: Data-Driven Decision Making in Business and Economics

ISBN：978-1-80590-569-1(Print) / 978-1-80590-570-7(Online)

Editor：Lukášak Varti

Conference website: https://2025.icftba.org/Bratislava.html

Conference date: 12 December 2025

Series: Advances in Economics, Management and Political Sciences

Volume number: Vol.245

ISSN：2754-1169(Print) / 2754-1177(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).