Regional Network Education Information Collection Platform for Smart Classrooms based on Big Data Technology

Big data technology plays an important role in optimizing the education intelligence by enhancing the learning experience through novel assessment strategies and predictive teaching. This work aims at improving the learning efficiency of users and increase the effective learning quality of the smart classroom concept using the big data technology. A regional network education information collection platform based on big data technology has been developed, which can collect student learning data for subsequent analysis and processing. The software architecture of this platform is mainly divided into basic layer, platform layer and access layer. The physical structure mainly includes web servers and Hadoop clusters. The big data acquisition platform of this education has excellent acquisition performance, and the data acquisition of different data item field length is larger than the expected index. The outcomes obtained reveals that the data item field is less than 20 and the acquisition amount is 145% of the expected result. When the data item field is between 20 and 40, the acquisition amount is 137% of the expected result. The collection is 116% of the expected result, when the data item field is between 40 and 50. When the data item field is between 50 and 60, the collection is 103% of the expected result. The comparison of the regular teaching method with the smart classroom based concept revealed that the big data platform is 87% better than the normal regular teaching methods in terms of study material as well as meaningful teacher-student interaction. The research on education big data can provide better education services for educational activities, drive the reform of teaching mode and optimize the teaching methods of education enabling the smart classroom concept. Povzetek:


Introduction
With the increasing popularity of the concept of big data, big data has gradually evolved into a social culture, that is, a culture where everyone produces data, everyone manages data, everyone shares data, and everyone loves data [1]. This is a culture that is invisible. Affect the culture of all walks of life. Even if the education field is still "conservative" all the year round, under the impact of this big data trend, people have begun to actively "embrace" big data [2]. In this context, new big data technologies are produced every year to acquire data, store data, analyze data and data visualization. With the support of these new technologies, the massive data generated in real time can be processed and stored, calculated and further mined and analyzed. At the data storage level of big data, there are well-known technologies such as HDFS and Hbase that use distributed file systems to store data [3]. At the level of big data mining and analysis, there are techniques such as pattern recognition, machine learning, deep learning, and visualization analysis. With the development of big data technology, it will become a part of modern social infrastructure in the near future. The significance of big data is not only reflected in the huge amount of data, but also in the infinite value contained in it.
Big data technology is an effective tool to conquer massive data and mine data value. In connection with the various challenges facing the country's current modernization, if we try to solve them through the innovative method of big data technology, it will greatly promote the development of many industries. Among them, industries that are closely related to people's livelihood, such as education, transportation, and medical care, have benefited most.
The digital revolution has resulted in the innovative technologies like smart classrooms, smart devices, pervasive computing in order to shape the mode and accessibility of teaching and learning [4][5][6]. All these factors lead to the transformation in the education system blended with the smart learning environment including various learning management platforms like E-learning, online courses, intelligent tutoring systems, etc. [7,8]. The real time practices have enabled the automatic gathering of data, its processing, storage and analysis for satisfying the big data attributes [9,10]. The different attributes of big data are described in Figure 1.
Big data involves the process of combining enormous volume of data and analyzing it using the complex algorithms [11][12][13]. The big data analytic deals with the usage of advanced techniques on a vast dataset for discovering the relevant pattern and information [14][15][16]. These analytics are actually helpful in increasing the efficiency for providing better insight and improve the awareness regarding the education services as well as institution requirements [17]. Various methods involved in big data analytics application for education are data mining, web dashboard and data analytics, etc. These technologies enable the data collection and interpretation using the evidence based learning concept for automated instruction assessment [18]. They facilitate in the better insight of big data technology for increasing the productivity and profitability of the smart education system. This work contributes in providing a regional network education information collection platform based on big data technology to collect student learning data for subsequent analysis and processing. The proposed platform is able to collect the student information based on their learning behavior, learning content, learning preferences and other data and then a relevant data mining approach is used for facilitating the smart classroom concept. The platform's software architecture is divided into three major layers: the basic layer, platform layer and the access layer. The physical layer of the platform comprises of web servers and Hadoop clusters. This big data acquisition platform for education enabling smart classrooms provides exceptional acquisition performance. The acquired data for different data item field lengths is much higher than the expected indexing. This research work implementing the big data scenario for educational activities can reform the teaching mode by optimizing the teaching methods for enabling the smart classroom concept.
The rest of this article is arranged as: section 2 provides the literature review of the current state of the art techniques in the smart classroom domain of big data analytics. Section 3 discusses the platform designing for the implementation. Results and analysis of platform function tests are depicted in section 4 followed by the concluding remarks of the study in section 5.

Literature review
In the Internet age, the data generated by educationrelated activities is called education big data. It is a data set collected according to the educational needs of the entire education activity. It can be used for the development of education and create huge potential value. Because of the particularity of the education field, education big data also has its unique characteristics. In the field of online education, data is diversified and there are multiple associations between data, so the collection and analysis of educational data has certain complexity. In October 2012, the U.S. Department of Education published "Promoting Teaching and Learning through Educational Data Mining and Learning Analysis ", pointing out that the mining and analysis of educational big data plays an important role in promoting the innovation construction of colleges and universities and teachers' teaching reform. The education big data emphasized here focuses on comprehensive and fullscale in-depth mining, scientific analysis and effective use of diversified education data, rather than being limited to the superficial large capacity. In August 2015, the State Council issued the "Outline of Action to Promote the Development of Big Data". The outline pointed out that big data resources play a fundamental role and have a strategic position in the development process of the country and the government. It is in the launch of the "Public Service Big Data Project". China emphasizes that the construction of education big data is urgent. It can be seen that the importance of big data in education has been raised to the national strategic level, and the government should attach great importance to the research on the application value of big data in education. In September 2015, my country established the first "China Education Big Data Research Institute". In 2016 and 2018, my country released two basic education big data blue books, and the content gradually changed from education big data to my country's education industry. The research on the macro-leading role has turned to the research on the specific issues of the current situation of education in my country. It can be seen from this that education big data has opened a new model for our government to play a decision-making role in the education field. In recent years, researchers at home and abroad have conducted many studies on education big data. To solve the problems of low data recall, poor data mining accuracy and poor interference ability of redundant data. Zhang proposed an adaptive big data mining recommendation algorithm based on Hadoop platform. The big data similarity mathematical model is constructed by statistical regression analysis. Using autocorrelation matching detection method to extract the relevant features of big data, using Bcklund transformation to decompose the time-frequency change features of big data, the adaptive recommendation model of big data mining based on Hadoop platform is designed. Optimize the performance of big data mining on Hadoop platform. the accuracy of data mining increases with the number of iterations. Strong redundant interference capability [19]. Xia, et al. proposed a parallel adaptive Canopy-K-means method. Based on statistical method, the algorithm can adaptively determine the distance threshold parameter T2. In cloud computing framework, by using the parallelism of Map-Reduce computing models. A parallel Canopy-K-means algorithm is optimized by adaptive parameter estimation, solve the problem that the parameters depend on manual experience selection in the Canopy process. After introducing the theory and derivation of the algorithm. Based on the Spark framework, the cloud computing experimental platform is constructed. Using Stanford Large Network Data Set (SNAP) data set and self-built Dimension Networks data set, a comparative experiment is carried out [20]. Hu, et al., in order to solve the problem of high cost of traditional data computing and storage and difficult to write parallel programs, summarized the core technologies of Hadoop, and used Hadoop distributed processing technology and virtualization technology to design and construct cloud computing storage platform. Experiments show that the platform has higher performance and higher resource utilization than the traditional single machine computing method and physical machine cluster computing method [21].
Songsangyos & Nilsook, stated that the characteristics of big data can be used to get the insight of reality by using the abundant amount of data for analysis [22]. The knowledge management can also be used for successful and rapid information generation. Hariri et al. [23] presented a big data approach that require high velocity reliable data for confirming the accurate information related to the results predicted by the data analysis. The predictive analytics was used by Altaye & Nixon in order to use the data mining technique for analyzing the student behavior for providing the interactive learning management [24]. Behavior detection can be described from the student's facial expressions, his participating activities, model knowledge and understanding. Kumar and Singh reveal that the Hadoop platform was developed for recording, organizing and analyzing the data effectively by separating it from big data [25].
With the continuous improvement of people's living standards, the demand for education and other public services is expanding, and the desire to solve the problem of imbalance and insufficiency of educational resources in various regions is more urgent [26]. The deep integration of the Internet and education provides new ideas and new ways to meet people's needs and solve various problems. The research on big data of online education can provide better education services for educational activities, drive the reform of teaching mode and optimize the teaching methods of education [27,28]. Therefore, this article has developed a regional network education information collection platform based on big data technology. This platform can collect students' usual learning behavior, learning content, learning preferences and other data, so that the subsequent data mining technology can be used to analyze and process the data to achieve personality. Recommendations are optimized to achieve the goal of optimizing information push services, so as to improve user learning efficiency and increase effective learning quality.

Platform design
The platform designing of software and physical architectural design is discussed in this section. The section also describes the function realization of the big data technology-based education information collection platform.

Platform software architecture design
The platform software architecture design describes the logical results and design methods of the entire system, and reflects the role of logical abstraction of the network education big data collection platform [29]. In the usual software architecture design work process, the software architecture diagram will explain the layers of the software architecture structure, and at the same time, it is also necessary to explain the relationship between each layer structure. This online education big data collection platform is divided into basic layer, platform layer and access layer according to the different services provided. The overall software architecture is shown in Figure 2. (1) Basic layer: Based on server, storage, network and virtualization system. By integrating infrastructure resources and improving the utilization rate of resources, it provides an extensible, safe and stable basic support for the educational big data analysis system [30]. In this paper, the Hadoop platform and its surrounding opensource components are used to build the basic layer service and store the educational data set.
(2) Platform layer: Based on the data analysis and acquisition platform, according to the data is collected, cleaned, converted, collected, regulated and monitored according to the data standard specification, and the buffer library is established. Through the meta-database to standardize the data integration, management and service process, establish a central library. Through the data mart, data warehouse and other forms of external data retrieval, data analysis, data reports, access interfaces, data comparison and other services. For network education big data application system, data analysis applications to provide data support.
3) Access layer: Provide access channels for all users who visit the platform. Students or teachers can access the application platform by means of a browser, or access through platforms such as WeChat.

Platform physical architecture design
For the smooth realization of the platform's expected functions and facing the user community, we must design on the physical architecture [31][32][33]. Among them, Web server provides online information browsing service, and Hadoop cluster provides data storage service. (

1) Web server
The entry that students and teachers use to enter the system with the main function is the Web client. In the target system, Web browser is the main application software of the client, and the core function of the system is realized on the Web server. Web browser performs data interaction through TCP/IP network protocol. Students and teachers log in to the user interaction interface through Web browser to realize the functions of personal information management, knowledge inquiry, knowledge evaluation, and knowledge collection. The Web server presents an operable interface to the user to accept user input. The system administrator also uses this user interface to manage the system and make settings. All user input is transmitted to the Web server through the HTTP protocol, and the server performs processing and calculation according to the established business logic. And store the results, and finally send the processing results back to the operator in an easy-tounderstand manner.

(2) Hadoop cluster
In order to meet the real-time distributed storage network education big data, and be good at log analysis, it can further analyze the user behavior data generated during the learning process of students and teachers [34][35][36]. Hadoop is a distributed system infrastructure developed by the Apache Foundation. It enables users to develop distributed programs when the underlying details of the distribution are unclear, and make full use of the powerful functions of the cluster to achieve high-speed computing and storage.

Platform function realization
In the field of online education big data, one big data type is student data. The content of student data covers a wide range, and the basic information of students, academic study information, daily study habits and other information can be summarized in it. As an online education big data analysis platform, it is necessary to gather the above information first, so as to understand the various learning dynamics of students in different directions to promote the good and healthy development of students [37]. Assuming that it is now necessary to collect data on the information on a campus website, the page crawling process during collection is shown in Figure 3.
First, the page crawler sends a request according to the set URL link to obtain the requested page information. After receiving the response, the entire page structure is parsed. During the analysis, the corresponding content is captured according to the preset filtering rules, and then the page is captured. The program then persistently stores the captured data in the database. The data collection function of the entire page is mainly implemented by Iuntanitem, LtSpider, and Dbpipe. The Iuntanitem class is an entity class, LtSpider is a business logic processing class, and Dbpipe is a data manipulation class. Good scalability and maintainability. The class diagram is shown in Figure 4.
The Iuntanitem class defines the data items captured by the page. The captured content is the author, title, class content and reply in the page. In the Iuntanitem class, four data fields, zuozhe, biaoti, leirong and huifushu, are defined with the Field () method. The data obtained by data collection will be encapsulated in this class and stored persistently. The LtSpider class is responsible for performing page crawling. The name attribute in this class can set a name as the identifier of the page crawler. This identifier will be unique: the start_urls attribute of this class, used to store the URL list, and the crawler uses the address in the list Page crawling for the starting target; pagelink instantiates a Link Extractor object; rule defines a rule parser. Dbpipe defines three methods, _init_() for initialization operations, such as initializing a database connection, process_item () for data persistence operations, and close_spider () to close the link. Page crawling is completed by the crawler class LtSpider, link extractor class Link Extractor instance pagelink, rule parser rule, item class and pipeline class Dbpipe. First, the crawler LtSpider requests the page according to the initial URL and obtains the content of the page. According to the  page content link extractor pagelink, the links in the page are extracted according to the set extraction rules. Analyze the specific page content associated with each link one by one, that is, the rule parser rule calls the callback function parse_item () and passes in the page content. The callback function parses the page content according to the specified parsing rules, and parses and extracts the data according to the preset field. The parsed data is submitted to the item class as a list type for encapsulation. Finally, the encapsulated item is passed as a parameter to the pipeline class Dbpipe for data persistent storage.

Results and analysis of platform function tests
According to the life cycle of software engineering, after the initial software requirements analysis, overall system architecture design, detailed system design, and coding implementation, the software system must be tested before it is officially launched. In this link, it is necessary to simulate the actual operating environment to test the system before the target system is officially put into operation, so as to achieve the effect of timely discovering system loopholes and eliminating system failures. In order to ensure that the system has a certain degree of stability and robustness in the actual environment after it is online, it is necessary to use some open-source test tools on the market to simulate the overloaded operating environment, test the performance level of the system in this environment and whether there is fatal performance defect. Due to the large amount of information in online education data, there are also certain requirements for the efficiency of data extraction. The online education big data platform is a big data experimental platform open to all online learners, with a large user base and a large number of people. The amount of daily collected and submitted data is very huge, so verifying the platform's data collection performance is a reference standard for evaluating whether the platform can operate stably under the condition of large amounts of data. The data set used in the test comes from posts, blogs, news forums and so on. There are 29462 data records. Data acquisition performance test cases and results are shown in Table 1. Table 1 shows that the data acquisition of different data item field length in this educational big data acquisition platform is larger than the expected index, in which the data item field is less than 20, and the acquisition amount is 145% of the expected result. When the data item field is between 20 and 40, the acquisition amount is 137% of the expected result. When the data item field is between 40 and 50, the collection is 116% of the expected result. When the data item field is between 50 and 60, the collection is 103% of the expected result. This education big data acquisition platform collection performance is excellent.
In order to access the usefulness of big data technological concept for enabling smart classroom approach, a questionnaire was conducted for the students to rate it from 1 (never) to 5 (always). The questions included in the questionnaire are: (1) Was I totally engaged while participating in the classroom activities? (2) Meaningful connection is made between the instructor and the students in the class activity. (3) The descriptive statistics for the questionnaire conducted are depicted in Figure 5 explaining the different opinions of the students grading their level of engagement. This comparison reveals that the classroom engagement of students is 57% using the smart big databased platform and the meaningful connection between the students and the instructor is 52%. The questionnaire was filled by the students themselves and further they were also asked to compare the big data smart classroombased approach with the normal teaching method. The questions asked during the survey are: (Smart classroom is much better)) This comparison done in Figure 6 reveals the usefulness of the smart classroom-based approach in respect to the regular teaching method. The majority of student with 82% population agreed that the smart classroom-based approach is better than the regular teaching method and 87% of them says that also the study material provided through this big data-based methodology is creative and better as compared to the regular teaching.
The survey conducted on the basis of student's learning is done in the experimental class of usage of big data-based teaching platform for enabling smart classrooms. The outcomes of the survey are presented in Table 2.
The survey depicted in Table 2 is purely based on the classroom observations and recording of students' attitude and behavior by experimental teachers in the whole process of classroom interaction.

Conclusions
Based on the knowledge and application of big data technology, this paper designs and implements an educational big data collection platform in accordance with the actual needs of social users. The main tasks as follows: (1) The software structure design and physical structure design of this platform are introduced. The software structure of this platform is mainly divided into basic layer, platform layer and access layer. The physical structure mainly includes web servers and Hadoop clusters.
(2) The implementation of data collection and storage on this platform is introduced. (3) The data collection efficiency of this platform was tested. The test found that the education big data collection platform has excellent collection performance, and the data collection of different data item field lengths is greater than the expected index. When the data item field is less than 20, the collection volume is the expected result. When the data item field is between 20 and 40, the collection volume is 137% of the expected result. When the data item field is between 40 and 50, the collection volume is 116% of the expected result. The data item field is introduced between 50 and 60, the collection volume is 103% of the expected result. A comparison is done of the regular teaching method with the smart classroom-based concept and it was revealed from the outcomes that the big data platform is 87% better than the normal regular teaching methods in terms of study material as well as meaningful interaction between the students and the instructor.

Student's learning
Most of the student want to prepare for further studies by interest build through smart classroom

Characteristics of Smart classroom
High interest in learning, better outcomes, students having interest in different subjects focuses on different points Student's learning difficulties The operation is relatively easier but memorizing is a challenge

Interest in experimental Content
Students get interest in experimentations and equipment

Percentage (%)
Comparison on the basis of approach Comparison on the basis of material