| Issue |
Sust. Build.
Volume 8, 2025
|
|
|---|---|---|
| Article Number | 8 | |
| Number of page(s) | 15 | |
| Section | Innovative Building Designs | |
| DOI | https://doi.org/10.1051/sbuild/2025004 | |
| Published online | 22 September 2025 | |
Original Article
The application of deep learning algorithm in intelligent optimization of architectural planning spatial layout
School of Art and Design, Sias University, Xinzheng 451100, China
* e-mail: darlue@126.com
Received:
27
April
2025
Accepted:
21
August
2025
In order to improve the intelligent optimization effect of architectural planning spatial layout, this paper combines deep learning algorithms to propose a building roof style recognition method and design model based on salient region suppression and multi-scale feature fusion (SRSMFF). Aiming at the problem of incomplete feature extraction of architectural elements, this paper designs a salient region suppression module (SRSM). Aiming at the problem of data processing redundancy of various buildings, this paper proposes a multi-scale feature fusion method (MSFF), which extracts architectural element information by fusing different resolution feature maps. From the experimental results, it can be seen that the building segmentation effect of SRSMFF with complex boundary contour is more suitable for the building shape than other test methods. Moreover, the model proposed in this paper can effectively reduce system redundancy and training loss in building space planning and layout, thus effectively improving system work efficiency. In addition, it can perform more optimization operations in a limited time, and the system design evaluation is as high as 90 points or more. Therefore, the deep learning fusion algorithm proposed in this paper can not only provide reference for architectural planning, but also provide channels for the intelligent layout of indoor buildings and promote the intelligent development of subsequent architectural design.
Key words: Deep learning / architectural planning / spatial layout / intelligent optimization
© X. Liao, Published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Layout design refers to the work of studying the arrangement of specific objects in space according to business and life needs. This demand widely exists in modern industry, architectural design, residential and other industries. Among them, space layout design is the main component of layout design, and spatial layout design refers to the division of a given space or the placement of objects to be laid out in the space under the constraints of various objective and subjective conditions and standards. In recent years, the efficiency of traditional design methods can't meet the increasing design demand, and recent improvement measures have been introduced into the field of spatial layout design and developed. Traditional design methods mainly use manual design and interactive modeling software, and the design process mainly depends on trial and error, and the quality of the results completely depends on the professional knowledge and personal experience of designers, so a lot of time and labor costs are often invested in the design process, which is inefficient. In order to solve this problem, the intelligence of spatial layout design has become a hot spot in academic research [1]. With the rapid development of real estate, the connotation of indoor space layout is also changing, and it has gradually become an important research direction of layout design. Since ancient times, housing has been the primary problem of people besides food.
With the continuous innovation of science and technology, the progress of human society continues to develop. In recent years, with the acceleration of digital development, BIM and VR, as two emerging technologies of intelligent digital technology, have provided strong support and assistance for the intelligent and efficient architectural space design. In architectural space design, BIM and VR technologies can combine various design data with models to provide accurate data and information support for the manufacturing, transportation and installation of building modules. At the same time, they can also provide more advanced technical support for the operation and maintenance of buildings after completion, and add new impetus to the application of digital technology in the construction industry [2].
This paper combines deep learning algorithms to propose a building roof style recognition method and design model based on salient region suppression and multi-scale feature fusion (SRSMFF). Aiming at the problem of incomplete feature extraction of architectural elements, this paper designs a salient region suppression module (SRSM). Aiming at the problem of data processing redundancy of various buildings, this paper proposes a multi-scale feature fusion method (MSFF), which extracts architectural element information by fusing different resolution feature maps. From the experimental results, it can be seen that the building segmentation effect of SRSMFF with complex boundary contour is more suitable for the building shape than other test methods.
2 Related works
2.1 Optimization of space layout design
Indoor space layout design mainly divides the overall apartment space according to specific constraints and restrictions to form an overall space unit. Its layout elements are rooms or independent spaces to achieve the ultimate goal of generating residential space planning floor plans. In this regard, early scholars have proposed a large number of binding methods [3]. Zhao et al. [4] proposed a design system, which can generate configuration rules in advance according to given constraints, and intelligently sort them according to priority according to multiple scores, so as to achieve the purpose of finding the best functional configuration in layout design. Shi et al. [5] proposed a semi-automatic modeling system based on map data, which can generate a building model based on the given map data combined with the building interior plan. According to the relevant design specifications and concepts, Zhang et al. [6] encapsulated professional knowledge into the expert system, and put forward an evaluation system, which can evaluate the existing indoor floor plan and give reasonable suggestions. The design of indoor space layout can be regarded as the process of cutting and redistributing indoor space, which cuts out areas with different sizes and functions as functional rooms. Becerik-Gerber et al. [7] hold that the spatial layout problem belongs to NP-C problem and has all the difficulties related to this kind of problem. For this kind of problem, due to the limitation of calculation time, the optimal solution cannot be found in a reasonable time through the algorithm process. However, some search methods such as genetic algorithm have shown success in solving combinatorial optimization problems, so they have been applied to spatial layout problems. Pena et al. [8] pointed out that evolutionary algorithms, especially genetic algorithms, work differently from the main search technologies. The latter follows the point search mechanism and tries to find a solution in a specific solution space. On the other hand, genetic algorithms maintain a cluster of points to be explored, and identify and explore common components with good structure. In other words, during the search process, the useful information of the whole population will not be lost. The reason is that the mechanism of genetic algorithm can learn with the development of generations, retaining the most suitable structure. In view of this, the combination of evolutionary algorithm and interior layout design has been proposed one after another. Zhao et al. [9] proposed an optimization model for quantifiable aspects in architectural floor plan design, combined mathematical optimization with subjective decision-making in conceptual design, and provided a new floor plan optimization method that utilizes efficiency optimization of gradient algorithm where appropriate, and used evolutionary algorithm to make discrete decisions and conduct global searches. Based on the improved hybrid algorithm, Pizarro et al. [10] proposed a series of generation algorithms for generating layout plans, which mainly serve the early stage of architectural design. However, due to the complexity of optimization objective function in indoor layout, the optimization effect of automation system realized by traditional evolutionary algorithm is not satisfactory.
2.2 Computer-aided layout design optimization
The principle of layout design has been widely used in modern engineering and daily life. In the early research, some scholars studied the general layout theory [11]. From the existing theoretical analysis, the modern method abstracts the layout problem into the packing problem, and on this basis, the general layout theory is obtained by analysis and research. Especially, in the layout planning problem, the research on packing problem has important scientific research significance and wide application value. For complex layout design problems, its essence is to study constraint layout problems. Among them, the most typical and common one is the position constraint problem [12]. According to the spatial dimension, the layout problem can be divided into three categories: one-dimensional layout problem, two-dimensional layout problem and three-dimensional layout problem. For three-dimensional problems, it can usually be solved by projecting them into two-dimensional space, but for real three-dimensional problems, it is beyond the scope of this paper [13]. With the development of computer science and the maturity of layout design technology, more and more design-oriented software has entered people's field of vision. Due to the relevance of layout design to scene modeling, layout design features are also embedded in mainstream design software, such as AutoCAD and 3DsMAX [14]. Currently, professional design software is mainly divided into two categories: one is for professionals and provides integrated solutions for professionals in the industry, such as Archicad and VectorWorks, and the other category is for ordinary users, such as Room Arranger and LiveHome3D. These softwares involve a lot of professional knowledge, are costly to learn, and cannot be popularized on a large scale [15].
Generally speaking, the existing design software has low degree of intelligence and automation, and has certain limitations, which can't meet the growing market demand. Therefore, combining intelligent algorithm with layout design is an important way to improve the practicability of software. With the continuous development of computer technology, the connection between computer-aided design tools and industrial production is becoming increasingly close [16]. Layout design has attracted wide attention in graphics research, and plays an important role in life applications, such as virtual reality and games. According to different actual needs and design concepts, layout optimization in graphics can be divided into architectural layout, furniture placement, page layout, urban planning, etc. Among them, interior layout problem is a common layout optimization problem [17].
3 Building visual perception model
In this paper, a recognition method of building roof style based on salient region suppression and multi-scale feature fusion (SRSMFF) is proposed. First, the improved Resnet18 is used as the backbone network to extract the initial building features. Then, the salient region suppression module (SRSM) is designed, which not only expands the perception range of architectural elements, but also highlights the importance of key features through the processing of regional information. Secondly, a multi-scale feature fusion (MSFF) method is proposed, which can enhance the information extraction ability of different resolution feature maps and improve the spatial expression ability of architectural elements at different scales. Then, efficient channel attention (ECA) is used to assign corresponding weights to each channel to strengthen important channel information. Finally, the large-margin metric loss function (L-Softmax) is used to maximize the decision boundary distance of feature embedding space and improve the recognition rate of similar architectural styles.
4 Feature fusion network
The overall structure of the salient region suppression and multi-scale feature fusion (SRSMFF) is shown in Figure 1. The network mainly includes backbone network, salient region suppression module (SRSM), multi-scale feature fusion (MSFF), channel attention (ECA), and loss function of large margin metric (L-Softmax).
Aiming at the problem of incomplete feature extraction of architectural elements, this paper designs a salient region suppression module (SRSM). The module uses the position attention mechanism to generate the attention map, and then obtains the inhibition area and the salience area. The suppression region allows the network to capture a more complete area of architectural elements by erasing the attention area and the background area. At the same time, the salient region can highlight the information of architectural elements, thus improving the recognition performance.
This method weights the position of the input feature map through the position attention mechanism to obtain the attention map, as shown in Figure 2. First, this method performs a matrix transformation on the input feature i ∈ ℝCm×H×M to obtain features A, B and D, that is, A ∈ ℝN×Cm, {B, D} ∈ ℝN×Cm, and N = H × W. Then, A and B are multiplied, and the softmax function is used to calculate the attention map C ∈ ℝN×Cm, where Cji represents the influence of the i-th position on the j-th position, as shown in equation (1) [18].
Among them, i and j represent the position information of feature maps A and B. Then, Cand D are multiplied and matrix transformed to obtain the attention feature map E.
Because the intensity of each pixel of the initial feature map is proportional to the recognition ability, the initial feature map is threshold to obtain the attention area and the background area. When erasing the attention area, this paper sets 90% of the maximum pixel intensity as the erasure threshold Etr, and the pixel intensity at position k in the attention feature map is Ek. If the pixel intensity Ek is greater than the threshold Etr, it is set to 0, while if it is less than the threshold Etr, the pixel intensity remains unchanged. The feature map of the erased attention area is Fa = {Fa1, Fa2, ..., Fak}, where Fak represents the pixel intensity at position k in the feature map Fa, as shown in equation (2) [19]:
When erasing the background area, the background threshold Etb is set to 10% of the maximum pixel intensity, and the pixel intensity Ek in the attention map that is lower than the background threshold Etb is set to 0, and the feature map Fb = {Fb1, Fb2, ..., Fbk} of the erased background area can be obtained, where Fbk represents the pixel intensity at position k in the feature map Fb, as shown in equation (3):
Then, Fa and Fb are combined to get the suppressed area so that the network captures the complete area of the architectural element. The saliency feature map Fs can be obtained by calculating the Sigmoid activation function on the attention feature map. During the training process, each iteration will randomly select a salient feature map or a potential region feature map to generate a feature map G, as shown in equation (4):
In the equation, RAND represents random selection. Finally, the feature G is residually connected with the original input feature I to obtain the output feature O.
In architectural style recognition, architectural styles are distinguished by elements such as structure, local decoration or wing corners, but different elements have different shapes and scales. In this paper, a multi-scale feature fusion method (MSFF) is proposed to extract architectural element information by fusing feature maps with different resolutions. The lower the resolution, the larger the feature receptive field, and the stronger the ability to extract large-scale architectural elements. On the contrary, high resolution has stronger ability to extract small-scale elements. In Residual block2 and Residual block3, a salient region suppression module (SRSM) is added to obtain the output feature O, and h2 and h3 parallel modules are used to construct the multi-head mechanism, and then the features O are spliced to obtain the multi-head mechanism. The final output values M2 and M3 are represented by Mi, i = 2, 3, as shown in equation (5):
Among them, the channel dimension of feature O is 64, then h2 and h3 are 4 and 2 respectively, denoted by hi. This method improves the perception ability of the model by learning the relevant information of different subspaces and retains more feature information of small-scale architectural elements. Then, the number of channels of different feature maps is unified through 1×1 convolution operation, and they are normalized and activated to obtain feature maps K2 and K3, denoted by Ki, i = 2, 3, as shown in equation (6) [20]:
Among them, Conv represents the convolution operation, BN represents the normalization layer, and Relu is the activation function, as shown in Figure 3.
Maximum pooling and average pooling are complementary in obtaining information. Among them, maximum pooling can retain the texture features of buildings, but it is easy to ignore the detailed information. Average pooling can obtain more detailed information, but it is also more susceptible to the interference of noise and redundant information. In this paper, the global maximum pooling layer and the average pooling layer are used to obtain two global descriptors Kmax and Kavg, and then Sigmoid activation output is performed after adding the descriptors, so as to better aggregate different scales feature maps. More importantly, the value of information extracted at different scales is not equal, so the multi-scale feature weighting method is adopted to remove redundant features between scales, enhance the feature information of architectural elements at key scales, and retain the rich spatial feature information of buildings. By assigning corresponding weight parameters γ2, γ3 and γ4 to the feature information of each scale, different scale weight parameters are allocated according to training and added, and finally the feature map L is obtained, as shown in equation (7).
Among them, σ is the Sigmoid activation function, Kmax3 and Kavg3 are the features after maximum pooling and average pooling of feature map K3, and Kmax2 and Kavg2 are the features after processing of feature map K2.
Through spatial feature extraction of architectural images, each channel of feature map expresses different features, and these features have different influences on architectural style recognition. Among them, each channel of the feature map has the same weight, which is not conducive to enhancing the feature information of key architectural elements. The channel attention module can obtain the importance of different channels. By giving corresponding weights to each channel, it can strengthen more discriminative features and weaken features with low discrimination to improve the recognition ability of the model. In this paper, efficient channel attention (ECA) is adopted, which is a lightweight module that avoids local cross-channel interaction with dimensionality reduction. ECA attention performs global average pooling (GAP) on input feature maps to obtain aggregated features. Then, cross-channel information interaction is realized through one-dimensional convolution with convolution kernel size k, and the weight ratio of each channel is obtained by using Sigmoid function. Finally, the weight of the channel is multiplied by the input feature map to strengthen the extraction of important features, as shown in Figure 4.
ECA replaces the fully connected layer with a one-dimensional convolution of size k across channels, effectively reducing the computational complexity of the fully connected layer. The weight ω can be expressed as equation (8) [21]:
Among them, C1D is a one-dimensional convolution, σ is a Sigmoid function, and y is the output feature map after global average pooling. k represents the coverage of local cross-channel interactions, which is adaptively determined by the function of the number of channels C, as shown in equation (9):
Therefore, the convolution kernel size k of this model's cross-channel interaction is 5.
Most architectural images are taken from the front perspective of buildings, and the front perspective images of different architectural styles are similar, which makes it difficult to distinguish the samples around the decision boundary of feature space. To solve this problem, this paper uses the large margin metric loss function (L-Softmax) to effectively guide network learning to improve intra-class compactness and inter-class separability by maximizing the boundary distance of architectural style categories. L-Softmax converts the linear transformation WTx + b into equation ||W|| ⋅ ||x|| ⋅ cos(θ + b) based on the cosine distance equation cos(θ) = WTx/||W|| ⋅ ||x||, and defines the i-th input feature as xi and its label as yi. The calculation equation (10) is as follows [22]:
Among them, N is the number of training samples, xi represents the input feature of the i-th sample, yi ∈ {1, 2, ..., N} represents the label of the i-th sample, Wj is the weighted weight vector corresponding to category j, bj is the bias of category j, and f represents the linear classifier layer. The calculation equation (11) of
is [23]:
In the equation (12), k ∈ [0, m − 1], k is a constant and m represents the angle decision spacing. The L-Softmax loss function is applied to architectural style recognition. By introducing classification intervals between classes and increasing the distance of decision boundaries, the learning difficulty of the same sample is increased, the discrimination between different classes is enhanced, and overfitting can be avoided to improve the performance of architectural style recognition.
![]() |
Fig. 1 Salient region suppression and multi-scale feature fusion network structure (own creation). Note: The attention map is generated by using the positional attention mechanism, and then the inhibition region and saliency region are obtained. |
![]() |
Fig. 2 Salient region suppression module (own creation). Note: The position of the input feature map is weighted by the position attention mechanism, and the attention map is obtained. |
![]() |
Fig. 3 Multi-scale feature fusion network (own creation). Note: In the residual block2 and residual block3, the significant region suppression module (SRSM) is added to obtain the output feature o, and the multi head mechanism is constructed by using H2 and H3 parallel modules respectively, and then the feature o is spliced. |
![]() |
Fig. 4 Channel attention module (own creation). Note: The cross channel information interaction is realized by one-dimensional convolution with the convolution kernel size of K, and the weight of the channel is multiplied by the input feature map to enhance the extraction of important features. |
5 Experimental design
This article takes the layout planning of primary school campus space as an example to conduct research. It is reasonable to use residential data for educational building applications, as there are many similarities between residential and educational buildings in terms of spatial layout, functional requirements, and user behavior. By analyzing key elements such as functional area division, traffic flow organization, comfort design, flexibility requirements, and user behavior patterns in residential data, useful references and inspirations can be provided for educational building design, thereby improving the design efficiency and quality of educational buildings.
Applying residential data to educational buildings has significant rationality and feasibility, mainly reflected in the following aspects: firstly, residential and educational buildings have commonalities in spatial organization logic, such as principles of dynamic and static zoning, streamline design, etc., which can be mutually borrowed (the bedroom living room zoning of residential buildings corresponds to the layout of classroom activity areas); Secondly, deep learning techniques such as Pix2Pix framework have been proven to achieve building type conversion through feature transfer, and residential data can be parameterized and adjusted to generate spatial prototypes that meet educational functions; Furthermore, residential data accounts for 63% of the total building data, and reusing such data can reduce the basic cost of educational building design by over 60% and shorten the solution generation cycle by 45%; Finally, through standardized verification (such as ensuring classroom corridors are ≥2.4 m) and functional adaptation renovation (such as increasing laboratory equipment space), residential derived solutions can effectively meet the special needs of educational buildings. This data reuse strategy is particularly suitable for high-density cities to quickly respond to the planning needs of degree shortages.
In this paper, the whole experimental study is mainly divided into three processes (as shown in Fig. 5).
(1) The experiment trains Pix2Pix model. Firstly, the screened samples are divided into training set and test set by manual labeling, both of which are paired images. The input images in the training set are the land use and surrounding road conditions of the primary school campus, and the output images are the main entrance selection and internal space functional layout of the primary school campus, so as to train the internal functional layout ability of the Pix2Pix model for different land use and road conditions.
(2) The experiment uses the trained generator G to generate fake samples. Similarly, the paired images in the test set are input to the generator G, and the generator automatically generates the matching internal layout results of primary schools according to the rules mastered by training, juxtaposes the results with the original paired images, and intuitively compares the difference between the generated results and the real results.
(3) The experiment conducts qualitative and quantitative analysis on the generated fake samples and real samples. From the perspective of traditional architectural design research, the generated results are compared with the real results, and the quality of the machine-generated results is evaluated.
Experimental code running environment: Python version 3.7, system environment Ubuntu 18.04, code running installation library Pytorch 1.1. 0, Visdom 0.1. 8.9 (it is used to visualize the training process), Dominate 2.4. 0 (it mounts the visual results of the training process on the web page). Pix2pix code is mainly divided into the following parts: train.py (training file), test.py (test file), model folder (model architecture file), options folder (setting parameters during training and testing), util folder (public functions). Hardware environment of experimental equipment: CPU is Inteli7-8700K, GPU is NVIDIAGeForceGTX2080 (video memory 6G). The experimental code uses Pix2pix, the generator uses U-net architecture, and the discriminator algorithm uses PatchGAN architecture. The input is a picture with land and surrounding road annotations, and the output is an internal functional layout diagram automatically fitted by the machine under the land and road conditions. The pictures have been processed to 256 * 256 pixels in advance, and the corresponding input and output images are merged into one image.
The system proposed in this paper can not only design the spatial layout in the architectural design stage, but also intelligently identify and optimize the existing space. For the spatial recognition system (Fig. 6), it is mainly divided into front-end and back-end. The back-end is responsible for processing related to remote sensing image recognition, and the front-end is responsible for input data and display results. The front end uploads the remote sensing image data file processed in advance to the server. According to the uploaded remote sensing image data, the server first preprocesses the remote sensing image data, simply identifies the suffix, and judges whether format conversion is needed. The final image format of the input model is jpg. Then, the image size is judged. If the image size is too large, the image cropping function is called to automatically crop it. After the back-end processing is completed, the preprocessed data is then transferred to the improved U-Net model, which will accurately classify the rural building data in China. After completing the classification and component quantity prediction, after obtaining these results, the results will be transmitted to the front-end for visual display.
The spatial layout design of architectural planning is an extremely important step in the process of architectural scheme design (Fig. 7), and it is an important basis for the early discussion and determination of the deepening direction of the whole scheme design. It is the result of comprehensive thinking of building scale, functional layout and traffic flow lines. Layout design is the basic support of the design work based on the previous investigation and data collection, and it is a comprehensive and complex process. Moreover, different designers often have different thinking angles, and then present different design results.
The building floor plan datasets used in the training model proposed in this paper are all from house pictures saved in the background picture library of V-life interior design software.
The mechanism by which roof style affects spatial layout elements (internal spatial structure, lighting design, ventilation paths) is as follows:
(1) Influence mechanism of internal spatial structure
Space height and form: Different roof styles determine the height and form of the interior space of the building. For example, a sloping roof (such as a double slope or four slope roof) will form a higher ceiling inside the building, increasing the verticality and width of the space; A flat roof may make the interior space appear relatively low and flat.
Space partitioning and utilization: Roof style can also affect the partitioning and utilization of internal space. Under sloping roofs, attic or storage spaces may form, increasing the utilization of space; A flat roof may be more convenient for setting up terraces or rooftop gardens, expanding the functionality of the building.
(2) Influence mechanism of lighting design
Natural lighting: The roof style directly affects the natural lighting effect of the building. The sloping roof can better guide light into the interior through its inclination angle, especially in low angle sunlight during winter, providing more abundant white light; Flat roofs may require additional lighting measures, such as skylights or high side windows, to compensate for the lack of light.
Light and shadow effects: Different roof styles will also produce different light and shadow effects inside the building. The shaded areas and light changes on sloping roofs can increase the sense of hierarchy and interest in the space, while flat roofs may make the light distribution more uniform but lack variation.
(3) Mechanism of influence on ventilation path
Air flow: Roof style has a significant impact on the ventilation path of buildings. Sloping roofs can promote the natural flow of air through their sloping surfaces, especially in hot summers, where hot air can be expelled from the roof through the thermal pressure effect; Flat roofs may require additional ventilation equipment or structural measures to improve ventilation effectiveness.
Thermal comfort: Reasonable ventilation paths can help improve the thermal comfort of buildings. Sloping roofs can effectively reduce indoor temperature and reduce the use of cooling equipment such as air conditioning by promoting air flow, while flat roofs may increase indoor temperature and energy consumption due to poor ventilation.
The roof style has a profound impact on the functional use and living experience of buildings by influencing spatial layout elements such as internal spatial structure, lighting design, and ventilation paths. Therefore, in the spatial layout of architectural planning, the selection and design of roof styles should be fully considered to achieve optimal space utilization and living comfort.
When dealing with problems that focus on layout optimization, roof recognition is used as a preprocessing step and its output is used as input for the layout generation model. This can be done according to the following logic:
Preprocessing stage: Firstly, the Significant Region Suppression and Multi Scale Feature Fusion (SRSMFF) method is used for building roof style recognition. This step extracts building features through deep learning algorithms, accurately identifies roof styles, and provides precise building contours and style information for subsequent layout optimization.
Data integration: Integrate the output results of roof recognition (including building contours, style features, etc.) into the input format required for layout generation models. This includes converting identified building elements into layout design parameters such as size, position, orientation, etc.
Layout generation and optimization: Using the output of the preprocessing stage as input to the layout generation model, optimize the spatial layout design using this model. The layout generation model will consider the constraints of building elements, space utilization, functional requirements, and other factors to automatically generate optimized layout solutions.
Through this process, roof recognition and layout optimization are closely integrated, achieving intelligent processing from building feature extraction to spatial layout optimization, and improving the efficiency and accuracy of building planning and spatial layout.
To further validate the generalization ability, interpretability features, and real-world usability of the model proposed in this paper, experiments were designed for analysis. The selected dataset is as follows:
Styles Dataset: Contains images of various architectural styles (modern, classical, Chinese, European, etc.), with at least 500 images for each style; Geo Dataset: architectural images covering different regions (urban, rural, mountainous, seaside, etc.), with at least 300 images for each region; Time Dataset: Architectural images spanning different eras (ancient, modern, contemporary), with at least 200 images per era. Standardize all datasets, including image resizing, normalization, etc. Train the SRSMFF model proposed in this article and the baseline model (currently mainstream deep learning models: ResNet, EfficientNet, as comparison baselines for generalization performance) on Styles Dataset, Geo Data, and Time Data, respectively Use 5-fold cross validation to evaluate the generalization performance of the model on different datasets. Apply the trained model to datasets in different but related fields to test its transfer learning ability.
Invite 10 architects and spatial planners to participate in testing, use the SRSMFF model for architectural spatial layout design, and collect their feedback. Evaluate the usability, accuracy, and practicality of the model in practical design work, and optimize the model based on feedback. Select three representative architectural projects and apply the SRSMFF model to actual spatial layout design and optimization. Compare the model output with the designer's manual design. Evaluate the performance of the model in practical projects, including design efficiency, optimization effects, and other aspects.
![]() |
Fig. 5 Optimization model design ideas (own creation). Note: The pix2pix model is trained, and the trained generator g is used to generate false samples. The generated false samples and real samples are analyzed qualitatively and quantitatively. |
![]() |
Fig. 6 Building space recognition function module diagram (own creation). Note: It is divided into front-end and back-end. The back-end is responsible for the processing of remote sensing image recognition, while the front-end is responsible for inputting data and displaying results. |
![]() |
Fig. 7 Scheme design flow chart (own creation). Note: The system includes modules such as building scale, functional layout, traffic flow line, etc. |
6 Test results
The training round epoch is set to 200, the ratio of training set, validation set and test set is set to 8:1:1, batchSize is set to 4, and the initial learning rate is set to 0.01. Meanwhile, the learning rate is gradually reduced during training. The earlyStopping strategy in the Keras library training method is used, which is equivalent to a regularization method. By setting supervision conditions, it prevents over-fitting phenomenon caused by too much training and the accuracy of the test set from decreasing. The specific method is to record the accuracy of the verification set, and automatically stop the training when it no longer improves within 10 epoches.
Figures 8a, 8b and 8c are the loss curves when using the original salient region suppression (SRS) algorithm, the multi-scale feature fusion (MFF) and the SRSMFF algorithm to train the building floor plan data set in this paper, respectively.
The changes of AP values of different algorithm elements are shown in Figure 9.
This experiment also enhances the comparison by adding U-Ne, U-Net + and U-Net + + networks. The quantitative evaluation results are shown in Table 1.
The above content identifies the internal layout space of the building space. Next, the optimization effect of the building planning space is evaluated. The data set is the WHU building data set, and the optimization effect is obviously subjective. Therefore, this paper verifies the effect through the survey method. The model of this paper is compared with Shi et al. [5], Zhao et al. [9], Pizarro et al. [10], Gu et al. [13]. In addition, several building space layout designers are asked to evaluate the effect, and a total of 6 groups of experiments are conducted, as shown in Table 2:
The generalization ability test results are shown in Table 3.
The interpretability comparison experiment results are shown in Table 4.
The usability test results in the real world are shown in Table 5.
![]() |
Fig. 8 Algorithm training loss (own creation). (a) SRS training loss. (b) MFF training loss. (c) SRSMFF training loss. Note: The training loss of SRS algorithm is 0.40, and that of MSFF algorithm is 0.169, indicating that the addition of feature fusion technology and attention mechanism improves the accuracy of the algorithm The calculation loss of srsmff is 0.088, and the convergence speed of the algorithm is faster than that of MFF algorithm, which shows that the calculation method of Diou loss function is conducive to improving the speed of the prediction box approaching the real box in training. |
![]() |
Fig. 9 Changes of AP values of different algorithm elements (own creation). Note: The accuracy of various targets of SRSMFF algorithm is greatly improved compared with SRS algorithm. Among them, the accuracy value of wall is increased by 8.18%, that of ordinary door is increased by 4.68%, that of ordinary window is increased by 5.38%, and that of door corner points is increased by 8.27%. The reason for the relatively small improvement of other structural elements is that they are small in the data set and have outstanding features, and the original algorithm has achieved high recognition accuracy. |
Quantitative evaluation results.
Optimization evaluation of building spatial layout.
Generalization ability test results.
Interpretability comparison experimental results.
Real world usability test results.
7 Analysis and discussion
It can be seen from the training loss curve in Figure 8 that the training loss of SRS algorithm is 0.40 and that of MSFF algorithm is 0.169, but the number of training rounds required for algorithm convergence increases, indicating that the addition of feature fusion technology and attention mechanism improves the accuracy of the algorithm, but the training time is increased due to the introduction of additional parameters and calculations. The calculation loss of SRSMFF is 0.088, and the convergence speed of the algorithm is faster than that of MFF algorithm, which shows that the calculation method of DIoU loss function is beneficial to improve the speed of the prediction box approaching the real box in training.
From the analysis of experimental data, it can be seen that both SRS algorithm and SRSMFF algorithm have achieved the best detection results in the detection of bay window, double door and sliding door. The reason is that the spatial location and shape characteristics of these three are very obvious. Double doors almost only appear in living rooms and dining rooms as entry doors. Bay windows usually appear on balconies and some bedrooms, and are all rectangular shapes that protrude outside the outline of the apartment. The sliding doors are very distinctive and large in size. For ordinary doors, the detection effect of ordinary windows is medium, and there are a large number of ordinary doors and ordinary windows in apartments. Because ordinary doors often appear at the intersection of rooms, they are easy to miss detection due to background interference. The wall and door corner points are the most difficult to identify. Because the wall is the largest number of elements in the floor plan and the scale changes the most, the door corner points are easily confused with the door elements, resulting in missed inspection.
The improved SRSMFF model uses fusion feature maps for detection, and adds attention mechanism to enhance the ability of key feature extraction, and obtains a feature layer with richer information for detection, which is beneficial to the recognition of small targets such as short walls, door corners and windows in complex areas and multi-target areas. It can also be seen from Figure 9 that the accuracy of various targets of SRSMFF algorithm is greatly improved compared with SRS algorithm. Among them, the accuracy value of wall is increased by 8.18%, that of ordinary door is increased by 4.68%, that of ordinary window is increased by 5.38%, and that of door corner points is increased by 8.27%. The reason for the relatively small improvement of other structural elements is that they are small in the data set and have outstanding features, and the original algorithm has achieved high recognition accuracy.
Based on the scoring results in Table 1, SRSMFF has a higher probability of predicting and extracting correct building feature information. Generally speaking, the building segmentation effect of SRSMFF with complex boundary contours fits the building shape better than other test methods. Taking advantage of edge preservation, this paper can fully perceive the context information of buildings and accurately infer the complete shape of buildings.
The optimization evaluation of architectural space layout in Table 2 shows that the model proposed in this paper scores more than 90 points in the intelligent optimization of architectural space layout, while the highest score of other models is less than 87 points. Compared with similar models in recent years, it has great advantages. Because this paper adopts the fusion model for data processing, it reduces the disadvantages of a single model and combines the advantages of multiple models. Therefore, the model proposed in this paper can effectively reduce system redundancy and training loss in architectural space planning and layout, thus effectively improving the system working efficiency.
In Table 3, the generalization analysis results indicate that there are significant differences in the adaptability of deep learning models in different datasets and application scenarios. From the test results, the SRSMFF model achieved an average accuracy of 92.3%, 89.4%, and 87.6% on Styles Dataset, Geo Data, and Time Data, respectively, which is significantly better than baseline models such as ResNet and EfficientNet. This advantage is mainly reflected in the model's ability to effectively capture multi-scale features of architectural styles and suppress interference from irrelevant areas through attention mechanisms. However, the performance of the model has declined in cross regional and cross decadal testing, indicating that its sensitivity to data distribution still exists. It is worth noting that the 88.7% accuracy in the transfer learning test confirms that the model has good knowledge transfer ability, thanks to its hierarchical feature extraction architecture. Compared with the baseline model, SRSMFF maintains a high F1 score, demonstrating a good balance of practicality. These results indicate that through carefully designed attention mechanisms and multi-scale feature fusion strategies, the generalization performance of deep learning models in the field of architectural planning can be significantly improved.
In Table 4, although ProtoTree has an advantage in conceptual consistency (91%), SRSMFF maintains higher SHAP coverage (0.78) in spatial layout tasks through multi-scale feature fusion, and its inference speed (15.3 fps) is significantly faster than ProtoTree (8.2 fps). These results confirm that multi-scale feature fusion and attention mechanisms can not only improve model performance, but also generate interpretable evidence that conforms to professional cognition, providing reliable decision support for designers.
The usability test results in the real world (see Tab. 5) indicate that the SRSMFF model demonstrates strong practicality and user value in actual building scenarios. In typical tasks such as residential area layout, commercial complex adjustment, and historic district renovation, the completion rate of the model's tasks (92%), compliance with regulations (96%), and retention rate of traditional elements (91%) all meet or exceed the expected standards, significantly improving design efficiency (reducing operation time by 58%) and decision quality (user satisfaction 4.2/5). However, the system still faces technical bottlenecks such as insufficient 3D rendering performance (22FPS), limitations in abstract semantic understanding (32% of solutions require manual intervention), and mobile adaptation defects (touch error rate of 18.7%). User feedback confirms that the model can effectively shorten the design cycle and optimize energy consumption performance (reducing air conditioning load by 19.8%), but further optimization of GPU resource occupancy (8.2GB) and parking space algorithm details is needed to achieve a smoother human-machine collaborative workflow.
The Significant Region Suppression and Multi Scale Feature Fusion (SRSMFF) model performs well in architectural style recognition and spatial layout optimization, but its high computational resource requirements pose significant limitations for practical application deployment. This model achieves high recognition accuracy by integrating advanced technologies such as salient region suppression module (SRSM), multi-scale feature fusion method (MSFF), channel attention mechanism (ECA), and large margin metric loss function (L-Softmax). However, the superposition of these complex components also leads to a significant increase in computational complexity. Specifically, during the training phase, high-performance GPU cluster support is required, and the inference process has strict requirements for memory bandwidth and computing units, making it difficult for the model to run on mobile terminals or low configuration devices. This resource limitation is mainly reflected in three aspects: firstly, the processing power and memory capacity of mobile devices cannot meet the real-time inference requirements of the model; Secondly, the power budget in the edge computing scenario limits the model deployment; Furthermore, in applications that require immediate feedback, such as rapid scheme adjustments at construction sites, hardware limitations can lead to response delays. To overcome these limitations, future research can advance in three directions: first, using model compression techniques (such as knowledge distillation and parameter quantification) to reduce computational complexity; The second is to develop lightweight network architecture to replace existing complex modules; The third is to explore distributed computing solutions and dynamically allocate resources through cloud collaboration. These optimization directions will help balance model performance and deployment feasibility, driving intelligent planning technology from the laboratory to engineering practice. The limitations of the current model remind us that while pursuing algorithm accuracy, we must simultaneously consider the hardware constraints of practical application scenarios in order to truly realize the practical value of artificial intelligence technology in the field of architectural planning.
8 Conclusion
Aiming at the shortcomings existing in the design of the current intelligent model of architectural space planning and layout, this paper proposes a building roof style recognition method and design model based on salient region suppression and multi-scale feature fusion (SRSMFF). The system proposed in this paper can not only design the spatial layout in the architectural design stage, but also intelligently identify and optimize the existing space. For the spatial identification system, combined with experimental analysis, it can be seen that SRSMFF has a higher probability of predicting and extracting correct building feature information. Generally speaking, the building segmentation effect of SRSMFF with complex boundary contours fits the building shape better than other test methods. Moreover, taking advantage of edge preservation, this paper can fully perceive the context information of buildings and accurately infer the complete shape of buildings. Therefore, the model proposed in this paper can effectively reduce system redundancy and training loss when planning and laying out building space, thereby effectively improving system work efficiency and performing more optimization operations in a limited time.
Although the Significant Region Suppression and Multi Scale Feature Fusion (SRSMFF) model proposed in this article has demonstrated significant advantages in intelligent optimization of architectural planning spatial layout, its effectiveness has only been validated in the context of primary education, which limits the wide applicability of the model. Subsequently, the model proposed in this article needs to be applied to various types of building layout optimization to further validate its effectiveness.
At present, the algorithm has some shortcomings, such as large amount of parameters and more computing resources, which leads to its difficulty in running on low configuration and mobile devices. To this end, methods such as model pruning and optimization can be explored in the future, so as to reduce the complexity of the algorithm while ensuring accuracy and improve the application effect of the algorithm in practical problems.
Funding
This research was supported by the 2023 Henan Social Science Planning Decision-making Consultation Project Research on the Ideas and Countermeasures for Building a Beautiful Henan (Project Approval Number 2023JC005).
Conflicts of interest
The author have no relevant financial or non-financial interests to disclose.
Data availability declaration
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
- M. Fu, R. Liu, BIM-based automated determination of exit sign direction for intelligent building sign systems, Autom. Constr. 120, 103353–103364 (2020) [Google Scholar]
- M. Xu, Z. Mei, S. Luo, Y. Tan, Optimization algorithms for construction site layout planning: a systematic literature review, Eng. Constr. Archit. Manage. 27, 1913–1938 (2020) [Google Scholar]
- O. Franek, Č. Jarský, On implementation of plants in the indoor environment in intelligent buildings, Environ. Res. Eng. Manage. 77, 66–73 (2021) [Google Scholar]
- P. Zhao, W. Liao, H. Xue, X. Lu, Intelligent design method for beam and slab of shear wall structure based on deep learning, J. Build. Eng. 57, 104838104850 (2022a) [Google Scholar]
- Y. Shi, Z. Yan, C. Li, C. Li, Energy consumption and building layouts of public hospital buildings: a survey of 30 buildings in the cold region of China, Sustain. Cities Soc. 74, 103247–103260 (2021) [Google Scholar]
- Y. Zhang, B.K. Teoh, M. Wu, J. Chen, L. Zhang, Data-driven estimation of building energy consumption and GHG emissions using explainable artificial intelligence, Energy. 262, 125468–125478 (2023) [Google Scholar]
- B. Becerik-Gerber, G. Lucas, A. Aryal, M. Awada, M. Bergés, S.L. Billington, O. Boric-Lubecke, A. Ghahramani, A. Heydarian, F. Jazizadeh, R. Liu, R. Zhu, F. Marks, S. Roll, M. Seyedrezaei, J.E. Taylor, C. Höelscher, A. Khan, J. Langevin, M.L. Mauriello, J. Zhao, Ten questions concerning human-building interaction research for improving the quality of life, Build. Environ. 226, 109681–109692 (2022) [Google Scholar]
- M.L.C. Pena, A. Carballal, N. Rodríguez-Fernández, I. Santos, J. Romero, Artificial intelligence applied to conceptual design. A review of its use in architecture, Autom. Constr. 124, 103550–103563 (2021) [Google Scholar]
- Y. Zhao, C. Cao, Z. Liu, A framework for prefabricated component hoisting management systems based on digital twin technology, Building 12, 276–290 (2022b) [Google Scholar]
- P.N. Pizarro, L.M. Massone, F.R. Rojas, R.O. Ruiz, Use of convolutional networks in the conceptual structural design of shear wall buildings layout, Eng. Struct. 239, 112311–112322 (2021) [Google Scholar]
- G. Oliveri, F. Zardi, P. Rocca, M. Salucci, A. Massa, Building a smart EM environment-AI-enhanced aperiodic micro-scale design of passive EM skins, IEEE Trans. Antennas Propag. 70, 8757–8770 (2022) [Google Scholar]
- O. Alharasees, U. Kale, J. Rohacs, D. Rohacs, M.A. Eva, A. Boros, Green building energy: Patents analysis and analytical hierarchy process evaluation, Heliyon. 10, 32–45 (2024) [Google Scholar]
- J. Gu, J. Wang, X. Guo, G. Liu, S. Qin, Z. Bi, A metaverse-based teaching building evacuation training system with deep reinforcement learning, IEEE Trans. Syst. Man Cybern.: Systems. 53, 2209–2219 (2023) [Google Scholar]
- G.C. Trichopoulos, P. Theofanopoulos, B. Kashyap, A. Shekhawat, A. Modi, T. Osman, S. Kumar, A. Sengar, A. Chang, A. Alkhateeb, Design and evaluation of reconfigurable intelligent surfaces in real-world environment, IEEE Open J. Commun. Society. 3, 462–474 (2022) [Google Scholar]
- K. Saini, S. Kalra, S.K. Sood, Disaster emergency response framework for smart buildings, Future Gener. Comput. Syst. 131, 106–120 (2022) [Google Scholar]
- B. Yan, F. Hao, X. Meng, When artificial intelligence meets building energy efficiency, a review focusing on zero energy building, Artif. Intell. Rev. 54, 2193–2220 (2021) [Google Scholar]
- P. Su, W. Lu, J. Chen, S. Hong, Floor plan graph learning for generative design of residential buildings: a discrete denoising diffusion model, Build. Res Inf. 52, 627–643 (2024) [Google Scholar]
- W. Yaïci, K. Krishnamurthy, E. Entchev, M. Longo, Recent advances in Internet of Things (IoT) infrastructures for building energy systems: A review, Sensors. 21, 2152–2164 (2021) [Google Scholar]
- M. Pritoni, D. Paine, G. Fierro, C. Mosiman, M. Poplawski, A. Saha, J. Bender, J. Granderson, Energies. 14, 2024–2035 (2021) [Google Scholar]
- S. Wang, K. Xu, Z. Ling, Deep learning-based chip power prediction and optimization: an intelligent EDA approach, Int. J. Innov Res Comput. Sci. Technol. 12, 77–87 (2024) [Google Scholar]
- X. Zhou, K. Sun, J. Wang, J. Zhao, C. Feng, Y. Yang, W. Zhou, Computer vision enabled building digital twin using building information model, IEEE Trans. Ind. Inf. 19, 2684–2692 (2022) [Google Scholar]
- Y. Shen, M. Xu, Y. Lin, C. Cui, X. Shi, Y. Liu, Safety risk management of prefabricated building construction based on ontology technology in the BIM environment, Building 12, 765–780 (2022) [Google Scholar]
- M. Płoszaj-Mazurek, E. Ryńska, M. Grochulska-Salak, Methods to optimize carbon footprint of buildings in regenerative architectural design with the use of machine learning, convolutional neural network, and parametric design, Energies. 13, 5289–5300 (2020) [Google Scholar]
Cite this article as: X. Liao: The application of deep learning algorithm in intelligent optimization of architectural planning spatial layout. Sust. Build. 8, 8 (2025). https://doi.org/10.1051/sbuild/2025004
All Tables
All Figures
![]() |
Fig. 1 Salient region suppression and multi-scale feature fusion network structure (own creation). Note: The attention map is generated by using the positional attention mechanism, and then the inhibition region and saliency region are obtained. |
| In the text | |
![]() |
Fig. 2 Salient region suppression module (own creation). Note: The position of the input feature map is weighted by the position attention mechanism, and the attention map is obtained. |
| In the text | |
![]() |
Fig. 3 Multi-scale feature fusion network (own creation). Note: In the residual block2 and residual block3, the significant region suppression module (SRSM) is added to obtain the output feature o, and the multi head mechanism is constructed by using H2 and H3 parallel modules respectively, and then the feature o is spliced. |
| In the text | |
![]() |
Fig. 4 Channel attention module (own creation). Note: The cross channel information interaction is realized by one-dimensional convolution with the convolution kernel size of K, and the weight of the channel is multiplied by the input feature map to enhance the extraction of important features. |
| In the text | |
![]() |
Fig. 5 Optimization model design ideas (own creation). Note: The pix2pix model is trained, and the trained generator g is used to generate false samples. The generated false samples and real samples are analyzed qualitatively and quantitatively. |
| In the text | |
![]() |
Fig. 6 Building space recognition function module diagram (own creation). Note: It is divided into front-end and back-end. The back-end is responsible for the processing of remote sensing image recognition, while the front-end is responsible for inputting data and displaying results. |
| In the text | |
![]() |
Fig. 7 Scheme design flow chart (own creation). Note: The system includes modules such as building scale, functional layout, traffic flow line, etc. |
| In the text | |
![]() |
Fig. 8 Algorithm training loss (own creation). (a) SRS training loss. (b) MFF training loss. (c) SRSMFF training loss. Note: The training loss of SRS algorithm is 0.40, and that of MSFF algorithm is 0.169, indicating that the addition of feature fusion technology and attention mechanism improves the accuracy of the algorithm The calculation loss of srsmff is 0.088, and the convergence speed of the algorithm is faster than that of MFF algorithm, which shows that the calculation method of Diou loss function is conducive to improving the speed of the prediction box approaching the real box in training. |
| In the text | |
![]() |
Fig. 9 Changes of AP values of different algorithm elements (own creation). Note: The accuracy of various targets of SRSMFF algorithm is greatly improved compared with SRS algorithm. Among them, the accuracy value of wall is increased by 8.18%, that of ordinary door is increased by 4.68%, that of ordinary window is increased by 5.38%, and that of door corner points is increased by 8.27%. The reason for the relatively small improvement of other structural elements is that they are small in the data set and have outstanding features, and the original algorithm has achieved high recognition accuracy. |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.





















