Image Content Forgery Detection Model Combining PSO and SVM in Electronic Data Forensics

amounts of


Introduction
The replication and forgery of image content, whether it be movement, regional replication, or otherwise, has become a serious social problem due to the rapid development of information technology [1].To combat this illegal behavior effectively, electronic forensics' use for evidence acquisition and parsing appears paramount.Electronic data forensics (EDF) refers to the entire process of acquiring, preserving, validating, verifying, interpreting, analyzing, archiving, and presenting evidence related to computer intrusion, sabotage, fraud, attack, and other criminal acts in a manner that adheres to legal norms [2].The determination of the legal validity of electronic evidence needs to follow the principles of authenticity, completeness, and legality [3].This is achieved through the use of computer hardware and software technology.In today's internet, a vast amount of information and images fill the network.Images, being one of the most intuitive forms of communication in people's daily lives, hold a crucial role [4].However, with the widespread popularity of various image editing software, the phenomenon of forgery of image content copying and movement or region copying and forgery also frequently occurs.This illicit conduct deceives others through tampering, counterfeiting, or manufacturing false images, which constitutes a serious violation of intellectual property rights and may lead to criminal activities such as fraud, thereby causing significant harm to both society and individuals [5].The use of image forgery technology may raise ethical concerns related to privacy and abuse, as it can violate personal privacy and damage personal image without consent.For instance, reconstructing or tampering with public photos using this technology may put individuals in unfavorable or awkward situations without their consent, which violates their privacy rights.Moreover, the misuse of image forgery technology may result in confusion or dissemination of misleading information.In fields such as social media and news, manipulated images can be utilized to disseminate false information, leading to public deception and erosion of social trust.The Gaussian mixture distribution similarity measure algorithm is an effective method for image content forgery detection (ICFD).It encodes the image's underlying features and uses them as inputs for classifier training in a support vector machine (SVM) to identify image region replication forgery [6].SVM is a supervised learning model used in classification and regression analysis, which can play an important role in forgery detection of image content copying and movement [7].Due to the continuous updating and iteration of information technology, new image encryption methods are constantly being developed.This further increases the difficulty of recognizing and detecting forged image content.One such method is the secret sharing encryption method, which uses polarization-assisted secret sharing phase encoding to hide forged secret information in orthogonal polarization channels.This improves the difficulty of decryption [8].The encryption method encodes each pixel for sub-pixel sharing and combines the dual encryption polarization key to reconstruct the target image.This increases the difficulty of detecting image content area duplication and forgery [9].Against this background, this study aims to improve the accuracy of ICFD and effectively combat image content forgery.To achieve this, SVM is innovatively utilized to learn the features of real and fake images and identify the differences between them during the training process.At the same time, in order to improve the accuracy of forgery detection, the study also uses the improved particle swarm optimization (PSO) algorithm to optimize SVM parameters, in order to improve the accuracy of ICFD in EDF.The contribution of the research lies in applying Gaussian mixture model (GMM) to ICFD, proposing an image forgery detection algorithm that combines local feature aggregation description encoding of SVM and GMM to improve the accuracy of color feature extraction and classification.At the same time, the PSO algorithm is applied to optimize SVM parameters to solve the problem of selecting parameters for detecting content forgery in SVM images.The study is divided into four primary segments, and in the second segment, a thorough evaluation of the existing domestic and international research on SVM and ICFD technology is conducted.The third section details the development of an image content forgery detection by support vector machine (ICFD-SVM) model to enhance PSO optimization for EDF.The first section investigates ICFD-SVM using Gaussian Mixture Distribution Local Feature Aggregation Description Coding.The second section implements ICFD-SVM utilizing enhanced PSO optimization.The fourth section validates the optimized PSO ICFD-SVM model for EDF.

Related works
ICFD technology is a crucial approach to guarantee the authenticity and integrity of images.It has attracted considerable attention from experts and scholars and has yielded fruitful findings through extensive research.To solve the issues of facial manipulation techniques in digital media forensics, Chen S and other researchers proposed a new method for face forgery detection through local relation learning, which utilizes a multi-scale patch similarity module for measuring the similarity between local area features.The findings shown that the approach, with robustness and interpretability, regularly outperforms the state-of-the-art methods in commonly used benchmark tests [10].In order to address the impact of forged fingerprints in biometric-based security systems, Baskar M and other scholars proposed a region-centered detail propagation measurement-based method to detect forged fingerprints, which utilizes a multistage Gabor filter to remove the noise points, and then converts the enhanced image into a number of integral images.The results indicated that the method effectively improved the accuracy of forged fingerprint detection [11].To design an effective method that can accurately detect in-depth forged images or videos, the research team of Arunkumar P M proposed to utilize deep learning techniques and introduced a fuzzy Fisher face model with capsule biplots to detect different types of fake images or videos.The results showed that the method achieved 89.32% accuracy in the dataset [12].
SVM plays an important role in ICFD techniques.To analyze negative and positive classes in movie review texts, Styawati S's research team used SVM in combination with Firefly algorithm to successfully construct a SVM-based sentiment classification model.This model is based on an optimized combination of 9 parameters.The results showed that the model achieved up to 89% accuracy in sentiment classification, demonstrating its excellent performance [13].Muthukrishnan et al. proposed a machine learning method for modeling and simulating heat exchangers in order to conduct virtual analysis of the performance of manufactured products before manufacturing simulation.This method simulated and analyzed heat exchangers, allowing engineers to analyze their performance before manufacturing.The results showed that this method is feasible [14].Aldino A and other researchers proposed using SVM algorithm to classify specific data on the platform in order to classify specific standards.Then, they divided the data into two label categories and rated and tested the label data.The results showed that the classification accuracy of SVM reached 97% [15].The summary table of related work is shown in Table 1.

Improved ICFD-SVM model design for PSO optimization for electronic data forensics
In this chapter, ICFD using similarity measure based on GMM and combining SVM with local feature aggregation descriptive coding of GMM for more effective color feature extraction and classification.To raise the model's accuracy and performance, the SVM model's parameters are also tuned via the enhanced PSO algorithm.

ICFD-SVM based on gaussian mixture distribution local feature aggregation description coding
Expectation maximization (EM) is a technique used in clustering that is taught using the Gaussian distribution (GD) as a parametric model [16].One of the most prevalent distribution types that may be observed in vast quantities in nature is the GD, sometimes known as the normal distribution [17].In both natural and fake images, In Figure 1, the GMM-based ICFD algorithm for similarity measurement constructs the respective feature matrices by extracting pixel values from natural and fake images, respectively, and uses these feature matrices to fit a GMM model.The GMM model that fits the feature matrix of natural images is set to T1, and the GMM model that fits fake images is set to T2.During the fitting process, the algorithm first processes the parameters a priori to optimize the performance of the model.Next, the algorithm uses the EM algorithm to re-fit each pixel value of the input test image.The parameters obtained from fitting T1 and T2 are processed and refitted to obtain new GMM models, G1 and G2.This stage aims to improve the models' ability to fit the pixel data of both real and fake images, in order to better identify forged and real images.During the refitting process, the algorithm synthesizes new GMM models.Finally, the algorithm takes a computational measure of the similarity between the two new GMM models.The algorithm can identify whether or not the test image is a forged image by evaluating its similarity to either the forged or natural image.The mathematical expression of GMM is shown in equation ( In equation ( 1

( ) ( )
In equation ( 2), P denotes the probability density function of the GMM and  is the set of parameters of the GMM.

 
( 1, 1, 1), ( 2, 2, 2), , ( , , ) In equation ( 3),  is the covariance matrix of the GD,  is the mean of the GD, and  is the weight of the GD.The calculation of the log-likelihood value using the EM algorithm is shown in equation (4).
In order to determine the convergence condition, the log-likelihood value is computed using equation ( 4).This step involves estimating the likelihood that each Gaussian component would provide the digital picture feature data.Equation ( 5) displays the probability produced by the In equation ( 5), p denotes the probability of generating a Gaussian component.The equation for GMM similarity is shown in equation ( 6).
In equation ( 6), S denotes the similarity of the GMM, and T and G denote the first fitted GMM and the newly fitted GMM, respectively.Direct feature extraction from the dataset may lead to excessive feature dimensionality, which may trigger the problem of dimensionality catastrophe [18].To solve this problem, feature extraction can be performed using GMM, while feature aggregation can be performed using local feature aggregation descriptive coding.The mathematical expression for local feature aggregation is shown in equation ( 7).
( ) arg min In equation (7), V denotes local feature aggregation and j c denotes the j th center [19].In local feature aggregation descriptive coding, the local features of each image or video frame are aggregated into a single vector which makes the representation of that frame more concise and efficient [20].However, when multiple classes of images are mixed together, color features may not be accurately represented, thus affecting the subsequent detection results.To solve this problem, this study innovatively combines SVM into GMM local feature aggregation description coding for color feature extraction and classification.Principal component analysis is a statistical method used for dimensionality reduction in images.Local feature aggregation and description encoding can be used to aggregate the local descriptive features in an image into a separate vector, resulting in efficient and concise image expression.The study employs principal component analysis to perform vector statistics on color features in images.First, K-means clustering is used to learn the codebook that describes the coding, thereby obtaining the color-based local feature aggregation descriptor of the image.Next, each image local descriptor is assigned to the nearest center in the codebook to obtain a quantified index.After assigning descriptors of each image to a center, the vector of the difference between the descriptors and the center can be obtained, and clustering features can be extracted based on the normalized vector.SVM is a supervised learning model that can be used for classification and regression analysis, and in ICFD, SVM can be used to train classifiers to distinguish real images from fake ones [21].Figure 2 shows the ICFD-SVM process based on GMM local feature aggregation description coding.The two primary steps of the ICFD-SVM method based on GMM local feature aggregate description coding are feature extraction and feature classification (Figure 2).The process begins with identifying the forged image.During this process, the SVM model is trained to recognize the features that distinguish one image from another.Next, the color features are encoded using local feature aggregation description coding.This effectively aggregates the local color features in the image to form a global color description vector.In this way, each image can be represented as a unique color description vector.Finally, these coded features are used as inputs to SVM models for training.These attributes are taught to the SVM model so it can differentiate between authentic and fraudulent photos.By classifying the input features during the training phase, the model progressively gains the ability to differentiate between real and fraudulent images.

ICFD-SVM based on improved PSO optimization
In classification problems, SVMs can be categorized into linearly differentiable SVMs, linearly indivisible SVMs and nonlinear SVMs [22].Among them, linearly differentiable SVM is the most commonly used type, which correctly separates samples of different classes by finding an optimal hyperplane.This optimal hyperplane is determined by the two samples closest to the separating hyperplane, which form two long bands parallel to the hyperplane.The hyperplane selection process of SVM is shown in Figure 3.In Figure 3, the solid and hollow points represent different two types of samples, the dashed line represents the separation hyperplane, while the solid line represents the two long bands consisting of the two samples closest to the separation hyperplane.In the optimization process of SVM, it is necessary to find a hyperplane that minimizes the classification error of all samples.If there exists a hyperplane that can correctly classify all the samples, the problem is said to be linearly separable; otherwise, the problem is said to be linearly indivisible.The regression function expression for SVM is shown in equation (8).
In equation ( 8),  denotes the weight coefficients and b is the bias term.The minimum value of the regression function is optimized as shown in equation (9).
In equation ( 9), c denotes the penalty coefficient and L denotes the insensitive loss function.The mathematical expression of the insensitive loss function is shown in equation (10).
( ) , ( , ( )) 0, ( ) In equation (10),  denotes the insensitive error and the insensitive loss function satisfies the obtained constraints as shown in equation (11). () In equation ( 11),  and  denote the relaxation variable outside the hyperplane and the relaxation variable inside the hyperplane, respectively.The mathematical expression of the linear SVM regression function is shown in equation (12).
In equation ( 12), * ,  denotes the Lagrange multiplier.Nonlinear SVM can be applied to linearly indivisible datasets.Nonlinear SVM maps the data from the original space to a higher dimensional space by using a kernel function, which makes the originally linearly indivisible data linearly differentiable.The choice of kernel function affects the performance of SVM [23].The study uses Gaussian kernel function for computation as shown in equation ( 13).
In equation ( 13), K denotes the Gaussian kernel function and g is the kernel function width of the Gaussian kernel function.The parameters selected during SVM model training have a significant effect on the model's accuracy and performance.The revised PSO algorithm can be used to optimize the SVM and determine the ideal set of parameters.The pseudocode for improving the PSO algorithm is shown in Figure 4.  Figure 4 shows that the improved PSO algorithm incorporates the parameters of each particle into SVM and calculates the fitness of each particle through training and cross-validation.Therefore, when using the improved PSO algorithm to optimize SVM, the penalty parameters in SVM are selected.The improvement of the inertia weights of the PSO algorithm is shown in equation (14).
In equation ( 14  ( 1) ( ( )) ( ) In equation ( 15), 3 r denotes the random number, and () D hz denotes the position where the worst adapted particle is located at the z th iteration.The enhanced PSO algorithm overcomes the limitations of the traditional PSO algorithm, which tends to be premature and prone to local optima.It offers the benefits of a simple structure, easy implementation, and fast convergence speed.Figure 5

ICFD-SVM model validation for improved PSO optimization for electronic data forensics
In this chapter, the specific environment of the experiment is configured, and then the various performances of the branch ICFD-SVM model optimized by the improved PSO algorithm are experimentally verified.

Experimental environment configuration
The datasets used for the experiments are obtained from ImageNet and COCO datasets, and the images are processed with region-copying forgeries and the processed dataset is divided into experimental training set and experimental testing set [24].The dataset contains a total of 1400 sample images, with resolutions ranging from 320 x 240 to 800 x 600, with an average resolution of 384 x 256.The ImageNet and COCO datasets are both sourced from publicly available datasets.The ImageNet dataset is characterized by its large scale, rich diversity, and high-quality annotated images.

Improved PSO Optimized ICFD-SVM Model Performance Validation
The SVM model is initially trained in order to verify the performance of the enhanced PSO optimized ICFD-SVM model.The model's bach_size value is set to 32, and the initial learning rate is set to 0. In Figure 8, the improved PSO-optimized ICFD-SVM model is compared with the unimproved PSO-optimized model on the dataset in terms of detection efficiency and accuracy to confirm its benefits in terms detection performance.In Figure 8(a), when the samples is 1400, the detection accuracies of the improved PSO optimized pre-and post-optimized models at this time are 89.36% and 94.89%, respectively.The detection accuracy of the model after improved PSO optimization is improved by 5.53% compared to the pre-optimization.In Figure 8(b), when the number of samples is 1400, the detection runtime of the improved PSO pre-optimization and post-optimization models are 22.64 ms and 22.06 ms, respectively.the runtime of the improved PSO-optimized model has been reduced by 2.56% compared with the pre-optimization.Comprehensively, the ICFD-SVM model after improved PSO optimization has effectively improved the detection accuracy as well as the detection efficiency.The terms specificity and sensitivity mathematically describe the accuracy of a test in reporting the presence or absence of conditions, where those that meet the conditions are considered "positive" and those that do not meet the conditions are considered "negative".Sensitivity, also known as true positive rate, refers to the condition under which the detection result is "positive", while specificity, also known as true negative rate, refers to the condition under which the detection result is "negative".Therefore, using specificity and sensitivity, the performance of algorithms to determine whether an image has undergone regional replication and forgery can be evaluated.Domain wide face forgery detection based on weighted learning.The specificity and sensitivity comparison results of different ICFD models are shown in Table 3.In the training set, the model achieved a maximum value of 96.54% for specificity, which is an improvement of 15.9%, 16.90%, 23.38%, and 23.90% compared to the LTW, AMTEN, FTNet, and LFA models, respectively.And in the test set, the sensitivity of the model achieved a maximum value of 95.14%, which is improved by 3.10%, 5.53%, 4.46% and 3.45% compared with the LTW, AMTEN, FTNet and LFA models, respectively.
Taken together, the improved PSO-optimized ICFD-SVM model shows a superior performance when comparing with other detection models.To conduct a more comprehensive study of the model, a benchmark test analysis of existing models will be conducted.This model will be compared and analyzed with other advanced modeling methods such as LTW, AMTEN, FTNet, and LFA on public datasets.As illustrated in Figure 11, the detection accuracy of the improved PSO-optimized ICFD-SVM model is compared with that of the LTW, AMTEN, FTNet, and LFA models in the validation set in order to more intuitively evaluate the detection performance of this model.From Figure 11, the dots represent outliers and the crosses represent the mean.The confidence interval of the improved PSO-SVM model is (96.12 ± 2.13), while the confidence intervals of LTW, AMTEN, FTNet, and LFA models are (78.10 ± 2.59), (77.83 ± 1.36), (71.51 ± 0.98), and (76.07 ± 11.09), respectively.It can be seen that the improved PSO-SVM model has the highest detection accuracy of 98.15%.In comparison, LTW, AMTEN The highest detection accuracy values of FTNet and LFA models are 80.69%, 79.19%, 72.49%, and 87.16%, respectively.Therefore, the improved PSO-SVM model has improved detection accuracy by 17.46%, 18.96%, 25.66%, and 10.99%, respectively.These data clearly demonstrate the superiority of the improved PSO-SVM model, further confirming its excellent performance in ICFD tasks.In summary, the PSO-SVM model addresses the limitations of traditional models that are prone to premature convergence and falling into local extremes.This demonstrates the potential of the PSO-SVM model in detecting image content forgery.

Discussion
The development of information technology has made image data a crucial tool for information dissemination.However, the rise of artificial intelligence technology has also led to an increase in image forgery, which has had a significant impact.To accurately identify forged images, PSO-SVM was proposed for image forgery recognition.
The results indicated that the optimized model achieves a detection accuracy of 94.89% when the sample size reaches 1400, which is 5.53% higher than the pre-optimized model.This improvement was significant compared to similar models in other literature.For instance, Arunkumar P M's research team's fuzzy Fisher face model detection method achieved an accuracy of 89.32% in the dataset [12].Styawati S's research team achieved the highest accuracy of 89% in sentiment classification using an SVM-based model [13].The model in this study is more accurate than the above methods.Additionally, the research successfully reduced the computational complexity of the model and improved detection speed by optimizing the algorithm and model structure.For a sample size of 1400, the post-optimized model's detection running time decreased by 2.56% compared to the pre-optimized model.The method's improvement enhances the model's ability to meet real-time and efficiency requirements in practical applications.Additionally, the study successfully optimized the parameters of the SVM model by improving the PSO algorithm, resulting in improved detection accuracy.The study aimed to verify the model's generalization ability and universality by conducting research on different datasets.To ensure stability and reliability, representative public datasets were selected and sufficient preprocessing and feature extraction work was conducted.This allowed the model to better learn the inherent patterns and features of the data, resulting in improved detection performance.In summary, the study proposes the PSO-SVM model, which has not only achieved significant improvement in detection accuracy after optimizing the PSO algorithm but also demonstrated unique advantages in methods, dataset usage, and computational efficiency.These advantages make the model a valuable contribution to related fields with broad application prospects.

Conclusion
ICFD plays a critical role in the EDF industry.

Figure 1 :
Figure 1: A similarity measurement image forgery detection algorithm based on GMM


denotes weight.The probability density function expression of GMM is shown in equation (2).

ia
is the weight of the i th parameter and i M is the i th GD of the GMM.i  is the set of parameters of the i th GD.The expression for the set of parameters corresponding to the GD in the image is shown in equation (3).

Figure 2 :
Figure 2: SVM image content forgery detection process based on GMM local feature aggregation description encoding

Figure 3 :
Figure 3: The process of selecting hyperplanes in SVM

Figure 4 :
Figure 4: Pseudocode for improving PSO algorithm and maximum inertia weight of the PSO algorithm, a denotes a constant, and 01 a ＜ ＜ , z and max z denote the current iteration number and maximum iteration number, respectively.The position update improvement of PSO algorithm using random perturbation operator is shown in equation (15).

Figure 5 :
Figure 5: Optimization of SVM image content forgery detection process based on improved PSO algorithm Experiments using this dataset can verify the generalization ability of the model due to its diverse image content.The COCO dataset, on the other hand, features images with rich object detection, segmentation, and subtitle annotation.The COCO dataset's rich annotation information makes it valuable for tasks like image recognition and segmentation.Using this dataset for experiments can help verify the model's universality.The types of image forgery in the dataset include homologous stitching and heterologous stitching forgery operations.To increase the difficulty of detecting forged images and the diversity of the dataset, the study performed forgery operations such as multi-region tampering and geometric transformations on some of the images.The forged parts and contents of these images were randomly generated.To more precisely assess the model's performance, the 1400 samples in the gathered and processed dataset are split into training and test sets at a ratio of 6:4.additionally, the study assembled 500 samples from the images that are manipulated geometrically and by multi-region tampering into a validation set.The example images of the data used in the test and training sets in the experiment are shown in Figure6.
(a) Natural images (b) Natural images (c) Regional replication forgery of images (d) Copying and moving forged images

Figure 6 :
Figure 6: Example images of data used in the test and training sets in the experiment

Figure 7 :
Figure 7: The training accuracy and training loss of SVM models during the training process

Figure 8 :
Figure 8: Comparison of detection accuracy and efficiency of models before and after PSO optimization

Figure 10 :
Figure 10: ROC curve of image content forgery detection model optimized by PSO and SVM

Figure 11 :
Figure 11: Comparison of detection accuracy of different model Experiments are carried out on the validation set to identify the localization of the model with various forgery techniques in order to validate the performance of the enhanced PSO-optimized ICFD-SVM model in practical applications.The results of image forgery region localization detection are shown in Table4.The model's localization accuracy exceeds 92% on different forgery methods.On the image scale transformation forgery content, the model's detection and localization precision achieves the highest value of 94.06%, on the image splicing forgery method, the model's recall achieves the highest value of 92.68%, and on the forgery method of adding noise interference, the model's F1 value achieves the highest value of 96.34%.Comprehensively, in practical validation, the model has high localization performance for images forged by different forgery methods.

Table 1 :
Summary of related work

Table 2 :
Specific experimental environment configuration

Table 3 :
Comparison of specificity and sensitivity of different image content forgery detection models

Table 4 :
Image forgery area localization detection results To enhance its accuracy, this study incorporates SVM with GMM local feature aggregation description coding based on the GMM model and optimizes the parameters of the SVM model using the improved PSO algorithm.When the number of samples reached 1400, the model optimized by the improved PSO was found to improve detection accuracy by 5.53% compared to the pre-optimized version, with a resultant detection accuracy of 94.89%.Additionally, the running time was observed to decrease slightly by 2.56% post-optimization.In the training set and test set, the model demonstrated detection accuracy of 97.35% and 96.71%, respectively.Moreover, the model attained a maximum specificity of 96.54% in the training set, surpassing the specificity of the LTW, AMTEN, FTNet, and LFA models by 15.9%, 16.90%, 23.38%, and 23.90%, respectively.In the test set, this model achieved a maximum sensitivity value of 95.14%.This was higher by 3.10%, 5.53%, 4.46%, and 3.45% compared to the LTW, AMTEN, FTNet, and LFA models, correspondingly.In addition, the model achieved the highest precision for detection and localization of 94.06% for image scale transformation forgery, the highest recall value of 92.68% for image splicing forgery, and the highest F1 value of 96.34% for the forgery method with added noise interference.In summary, the results suggest that the combination of PSO with the ICFD-SVM model in EDF enhances the detection accuracy of forged images.However, the study was solely validated experimentally against six image forgery methods, and further improvements regarding comprehensive experimental results are necessary.