Shangpu Consulting: Necessary Conditions and Advantages of National Manufacturing Individual Champion Proof-Shangpu Consulting

+86-10-82885719

Data Mining and Analysis Techniques for Market Research Companies

2024-07-18 09:24:47 Source: Champu Consulting Visits:0

Concepts and processes of 1. data mining and analysis techniques

Data mining and analysis technology refers to the use of advanced statistics, machine learning, artificial intelligence and other technologies, from the vast amount of market data to extract valuable information and knowledge technology. Data mining and analysis techniques can help market research companies discover market patterns and trends, predict market changes and needs, optimize market strategies and effects, and innovate market products and services.

The general process of data mining and analysis techniques consists of the following five steps:

Data pre-processing: Data pre-processing refers to the cleaning, transformation, integration, specification and discretization of raw market data to improve the quality and applicability of the data. The purpose of data preprocessing is to remove noise, missing values, abnormal values, repeated values, etc. in the data to reduce data errors and inconsistencies; convert the data into a unified format, measurement, range, etc., to increase the comparability and analysis of the data; Integrate data from different sources and platforms into a unified data warehouse or data lake to increase data integrity and availability; the data is dimensionality-reduced, sampled, feature-selected, etc. to reduce the complexity and redundancy of the data, and the data is grouped, classified, clustered, etc. to increase the structure and interpretability of the data.

Data exploration: Data exploration refers to descriptive statistical analysis and graphical presentation of pre-processed market data to understand the basic characteristics and distribution of the data. The purpose of data exploration is to summarize and summarize the data to obtain the basic information of the data, such as data type, quantity, dimension, mean, variance, maximum, minimum, median, mode, frequency, frequency, etc.; Visualize and graph the data to obtain the basic form of the data, such as data distribution, concentration, dispersion, skewness, kurtosis, correlation, etc.

Data modeling: Data modeling refers to the application of inferential statistical analysis and machine learning to the explored market data to build models and algorithms for the data. The purpose of data modeling is to generalize and reason about the data to obtain the deep meaning and relationship of the data, such as hypothesis testing, parameter estimation, confidence interval, significance level, correlation coefficient, regression coefficient, etc. of the data, and the application of machine learning to obtain the prediction and classification ability of the data, such as data supervision learning, unsupervised learning, semi-supervised learning, intensive learning, deep learning, neural networks, decision trees, support vector machines, clustering analysis, association rules, text mining, emotional analysis, etc.

Data evaluation: Data evaluation refers to the evaluation and verification of the validity and accuracy of the modeled market data to test the performance and applicability of the data models and algorithms. The purpose of data evaluation is to evaluate and compare the data to obtain the advantages and disadvantages and improvements of the data, such as data error analysis, accuracy, recall, precision, F1 value, ROC curve, AUC value, confusion matrix, cross-validation, outliers, feature selection, etc.

Data visualization: The market research company visualized the assessed user behavior data, including the following operations:

Graphical display, graphical and dynamic display of data, presenting the results and effects of data models and algorithms, such as data charts, graphs, images, maps, dashboards, storyboards, etc., such as drawing the ROC curve and AUC value of the user purchase behavior prediction model of the platform, drawing the confusion matrix and cross-validation of the user purchase behavior classification model of the platform, the clustering analysis and association rules of the user behavior type of the platform are drawn, and the text mining and sentiment analysis of the user evaluation content of the platform are drawn;

Interactive display, interactive and real-time display of data, providing data filtering, sorting, zoom in, zoom out, switch, update and other functions, such as Tableau, Power BI, D3.js and other tools, to build an interactive and real-time user behavior data analysis dashboard, it shows the real-time changes and trends of user registration information, user login information, user browsing information, user purchase information, user evaluation information, etc., as well as various analysis and visualization results of user behavior data.

Education Industry: Data Preprocessing Data Exploration Data Modeling Data Evaluation Data Visualization.

A market research company was commissioned by a domestic online education platform to conduct a data mining and analysis of the platform's learning behavior and learning effects, aiming to understand the platform's learner characteristics, learner satisfaction, learner loyalty, and the main factors that affect learners' learning behavior and learning effects. The market research company used data pre-processing, data exploration, data modeling, data evaluation and data visualization methods, respectively, the following steps:

Data pre-processing: The market research company obtained data such as learner registration information, learner login information, learner viewing information, learner completion information, and learner evaluation information within one year from the database of the online education platform, with a total of about 500000 records and about 15 fields. The market research company preprocessed the data, including the following operations:

Clean the data, remove noise, missing values, abnormal values, duplicate values, etc. in the data, such as deleting invalid learner ID, email address, mobile phone number, etc., filling in missing learner gender, age, region, etc., eliminating abnormal learner login times, viewing time, completion rate, etc., and removing duplicate learner registration information, learner completion information, etc;

Convert data into unified format, measurement, range, etc. For example, convert learner registration time, learner login time, learner viewing time, learner completion time, etc. into unified date format, convert learner gender, learner area, learner evaluation, etc. into unified category code, convert learner login times, learner viewing time, learner completion rate, etc. into unified numerical units, convert learner viewing information, learner completion information, etc. into a binary matrix of learner-curriculum;

Integrate data, integrate data from different tables and files into a unified data framework, such as learner registration information, learner login information, learner viewing information, learner completion information, learner evaluation information, etc. are connected according to learner ID to form a complete learner behavior data set;

For example, principal component analysis (PCA) method is used to reduce the dimension of the data, retain the main variables and information of the data, use stratified sampling method to sample the data, retain the representativeness and generability of the data, use information gain (IG) method to select the characteristics of the data, retain the key characteristics and influencing factors of the data;

Discrete data, group, classify and cluster the data. For example, use equal frequency or equal width method to group the learner's age, the number of learner logins, the completion rate of learners, etc. to form different intervals and grades. Use decision tree or simple Bayesian method to classify the learner's gender, learner's area, learner evaluation, etc. to form different categories and labels, k-Means or DBSCAN method is used to cluster the learner behavior data to form different groups and types.

Data exploration: The market research company conducted data exploration on pre-processed learner behavior data, including the following operations:

Statistical analysis, summarizing and summarizing the data to obtain basic information about the data, such as the type, quantity, dimension, mean, variance, maximum, minimum, median, mode, frequency, frequency, etc. of the data, for example, the total number of learners, the registration rate of learners, the activity rate of learners, the retention rate of learners, the conversion rate of learners, the review rate of learners, the average viewing time of learners, the average completion rate of learners, the average number of courses completed by learners, etc. are calculated;

Visual analysis, visualization and graphics of data, to obtain the basic form of data, such as data distribution, concentration, dispersion, bias, kurtosis, correlation, etc, for example, the age distribution of learners, the regional distribution of learners, the gender distribution of learners, the frequency distribution of learner login, the distribution of learner viewing time, the distribution of learner completion rate, the distribution of the number of courses completed by learners, the distribution of learner evaluation, and the distribution of learner behavior types are plotted.

Data modeling: The market research company performed data modeling on the exploratory learner behavior data, including the following operations:

Hypothesis testing, induction and reasoning on the data, obtaining the deep meaning and relationship of the data, such as hypothesis testing, parameter estimation, confidence interval, significance level, correlation coefficient, regression coefficient, etc., such as testing whether the learner's gender, learner's region, learner's evaluation of the platform have a significant impact on learner's learning behavior and learning effect, the average and standard deviation of the platform's learner satisfaction, learner loyalty, learner learning effect, etc. are estimated, and the correlation coefficients and regression coefficients of the platform's learner login times, learner viewing time, learner completion rate, etc. are calculated;

Machine learning, the application of machine learning to data, to obtain data prediction and classification capabilities, such as data supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning, neural networks, decision trees, support vector machines, clustering analysis, association rules, text mining, sentiment analysis, etc., such as the use of neural network methods to predict the learning effect of learners, using the method of decision tree to classify the learner's learning effect, using the method of clustering analysis to divide the learner's learning type, using the method of association rules to analyze the learner's learning habits, using the method of text mining and emotional analysis to analyze the evaluation content of the learner.

Data evaluation: The market research company conducted a data evaluation of the modeled learner behavior data, including the following operations:

Error analysis, evaluation and comparison of data, access to data advantages and disadvantages and improvements, such as data error analysis, accuracy, recall, precision, F1 value, ROC curve, AUC value, confusion matrix, cross-validation, anomaly detection, feature selection, etc, for example, the accuracy rate, recall rate, precision rate, F1 value, ROC curve, AUC value, etc. of the learner learning effect prediction model of the platform are calculated, the confusion matrix and cross-validation of the learner learning effect classification model of the platform are analyzed, and the outliers and feature selection in the learner behavior data of the platform are detected;

Model optimization is to adjust and improve the data to improve the performance and applicability of the data model and algorithm, such as using grid search, random search, Bayesian optimization and other methods to optimize the parameters of the data model and algorithm, such as using ensemble learning, transfer learning, meta-learning and other methods to optimize the structure of the data model and algorithm.

Data Visualization: The market research company performed data visualization on the assessed learner behavior data, including the following operations:

Graphical display, graphical and dynamic display of data, presenting the results and effects of data models and algorithms, such as data charts, graphs, images, maps, dashboards, storyboards, etc., such as drawing the ROC curve and AUC value of the learning effect prediction model of the platform, drawing the confusion matrix and cross-validation of the learning effect classification model of the platform, the cluster analysis and association rules of the learners' learning type of the platform are drawn, and the text mining and emotion analysis of the learners' evaluation content of the platform are drawn;

Interactive display, interactive and real-time display of data, providing data filtering, sorting, zoom in, zoom out, switch, update and other functions, such as using Tableau, Power BI, D3.js and other tools to build an interactive and real-time learner behavior data analysis dashboard, the real-time changes and trends of learner registration information, learner login information, learner viewing information, learner completion information, learner evaluation information, etc. of the platform are displayed, as well as various analysis and visualization results of learner behavior data.

Medical Industry: Data Preprocessing Data Exploration Data Modeling Data Evaluation Data Visualization.

A market research company was commissioned by a domestic medical platform to conduct a data mining and analysis of the health behavior and health status of the platform, aiming to understand the user characteristics, user satisfaction, user loyalty of the platform, and the main factors affecting the user's health behavior and health status. The market research company used data pre-processing, data exploration, data modeling, data evaluation and data visualization methods, respectively, the following steps:

Data preprocessing: The market research company obtained data such as user registration information, user login information, user measurement information, user diagnosis information, and user evaluation information within one year from the database of the medical platform, with a total of about 200000 records and about 10 fields. The market research company preprocessed the data, including the following operations:

Clean data, remove noise, missing values, abnormal values, duplicate values, etc. in the data, such as deleting invalid user ID, email address, mobile phone number, etc., filling in missing user gender, age, region, etc., eliminating abnormal user login times, measurement results, diagnosis results, etc., and removing duplicate user registration information, user diagnosis information, etc;

Convert data, convert data into unified format, measurement, range, etc., such as user registration time, user login time, user measurement time, user diagnosis time, etc. into unified date format, convert user gender, user region, user evaluation, etc. into unified category code, and convert user login times, user measurement results, user diagnosis results, etc. into unified numerical units, converting user measurement information, user diagnostic information, etc. into a user-indicator binary matrix;

Integrating data, integrating data from different tables and files into a unified data framework, such as connecting user registration information, user login information, user measurement information, user diagnostic information, user evaluation information, etc. according to user ID, to form a complete user health data set;

For example, principal component analysis (PCA) method is used to reduce the dimension of the data, retain the main variables and information of the data, use stratified sampling method to sample the data, retain the representativeness and generability of the data, use information gain (IG) method to select the characteristics of the data, retain the key characteristics and influencing factors of the data;

Discrete data, group, classify and cluster the data. For example, use equal frequency or equal width methods to group the user's age, user login times, user measurement results, user diagnosis results, etc. to form different intervals and grades. Use decision tree or naive Bayesian methods to classify the user's gender, user area, user evaluation, etc. to form different categories and labels, k-Means or DBSCAN method is used to cluster user health data to form different groups and types.

Data exploration: The market research company conducted data exploration on pre-processed user health data, including the following operations:

Statistical analysis, summarizing and summarizing the data to obtain basic information about the data, such as the type, quantity, dimension, mean, variance, maximum, minimum, median, mode, frequency, frequency, etc. of the data, for example, the total number of users, user registration rate, user activity rate, user retention rate, user conversion rate, user follow-up rate, user average measurement result, user average diagnosis result, user average diagnosis index number, etc. of the platform are calculated;

Visual analysis, visualization and graphics of data, to obtain the basic form of data, such as data distribution, concentration, dispersion, bias, kurtosis, correlation, etc, for example, the user age distribution, user area distribution, user gender distribution, user login frequency distribution, user measurement result distribution, user diagnosis result distribution, user diagnosis index number distribution, user evaluation distribution, user health type distribution, etc.

Data modeling: The market research company conducted data modeling on the user health data after exploration, including the following operations:

Hypothesis testing, induction and reasoning on the data, to obtain the deep meaning and relationship of the data, such as hypothesis testing, parameter estimation, confidence interval, significance level, correlation coefficient, regression coefficient, etc., such as testing whether the platform's user gender, user area, user evaluation, etc. have a significant impact on the user's health behavior and health status, the average and standard deviation of the platform's user satisfaction, user loyalty, user health status, etc. are estimated, and the correlation coefficients and regression coefficients of the platform's user login times, user measurement results, user diagnosis results, etc. are calculated;

Machine learning, the application of machine learning to data, to obtain data prediction and classification capabilities, such as data supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning, neural networks, decision trees, support vector machines, clustering analysis, association rules, text mining, emotional analysis, etc., such as the use of neural network methods to predict the health of users, using the decision tree method to classify the user's health status, using the method of clustering analysis to divide the user's health type, using the method of association rules to analyze the user's health habits, using text mining and emotional analysis methods to analyze the user's evaluation content.

Data evaluation: The market research company conducted a data evaluation of the modeled user health data, including the following operations:

Error analysis, evaluation and comparison of data, access to data advantages and disadvantages and improvements, such as data error analysis, accuracy, recall, precision, F1 value, ROC curve, AUC value, confusion matrix, cross-validation, anomaly detection, feature selection, etc, for example, the accuracy rate, recall rate, precision rate, F1 value, ROC curve, AUC value, etc. of the user health status prediction model of the platform are calculated, the confusion matrix and cross-validation of the user health status classification model of the platform are analyzed, and the outliers and feature selection in the user health data of the platform are detected;

Model optimization is to adjust and improve the data to improve the performance and applicability of the data model and algorithm, such as using grid search, random search, Bayesian optimization and other methods to optimize the parameters of the data model and algorithm, such as using ensemble learning, transfer learning, meta-learning and other methods to optimize the structure of the data model and algorithm.

Data visualization: The market research company performed data visualization on the assessed user health data, including the following operations:

Graphical display, graphical and dynamic display of data, presenting the results and effects of data models and algorithms, such as data charts, graphs, images, maps, dashboards, storyboards, etc., such as drawing the ROC curve and AUC value of the user health prediction model of the platform, drawing the confusion matrix and cross-validation of the user health classification model of the platform, the clustering analysis and association rules of the user health type of the platform are drawn, and the text mining and sentiment analysis of the user evaluation content of the platform are drawn;

Interactive display, interactive and real-time display of data, providing data filtering, sorting, zooming in, zooming out, switching, updating and other functions, such as using Tableau, Power BI, D3.js and other tools to build an interactive and real-time user health data analysis dashboard, the real-time changes and trends of user registration information, user login information, user measurement information, user diagnosis information, user evaluation information, etc. of the platform are displayed, as well as various analysis and visualization results of user health data.

Conclusion

Data mining and analysis technology is one of the core competitiveness of market research companies, which can help market research companies extract valuable information and knowledge from massive market data, so as to provide effective market insight and decision support for enterprises. This article introduces the data mining and analysis techniques commonly used by market research companies, including data preprocessing, data exploration, data modeling, data evaluation and data visualization, as well as their principles and applications. At the same time, this paper also illustrates how market research companies use data mining and analysis technology to provide valuable solutions and suggestions for customers in different industries. The research of this paper has certain reference value and enlightenment significance for the development and application of data mining and analysis technology in market research companies.




User evaluation

Shangpu Consulting-Market Research & Consulting China Pioneer

immediate consultation
  • On July 05, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the "In-process Plastic Market Research Project in the Automotive Sector. The customer said: The project report completed by Shangpu Consulting in cooperation with our company is due to the wide range of projects and strong professional products. Thank you very much for the professional and detailed market research report of Shangpu Consulting. I look forward to cooperating again next time and wish Shangpu Consulting by going up one storey! Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 05, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the "Research Project of the Network Designated City Transport Company. The customer said: The market research project provided by Shangpu Consulting for our company has provided us with a valuable reference basis for objectively evaluating the current market situation of the industry and achieved the expected goal. I also wish Champ Consulting the development of by going up one storey! Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 07, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for "A Brand Sales Leading Research Project in the Water Purifier Industry. The customer said: The market research project provided by Shangpu Consulting for our company has provided us with a valuable reference basis for objectively evaluating the current market situation of the industry and achieved the expected goal. I also wish Champ Consulting the development of by going up one storey! Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • July 07, 2020, Shangpu Consulting received a satisfaction evaluation sheet from the customer's "Lithium Battery Enterprise Sales Strategy and Production Cost Research Project. The customer said: The market research project provided by Shangpu Consulting for our company has provided us with a valuable reference basis for objectively evaluating the current market situation of the industry and achieved the expected goal. I also wish Champ Consulting the development of by going up one storey! Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 07, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the "Coal Mine Tunnel Drilling Rig Market Share Proof Project. The customer said: The survey plan of Shangpu Consulting is rigorous in design, scientific in method, standardized and rigorous in survey organization process, and basically reliable survey data, which provides relatively credible first-hand information for our research work. The research results are of great help to our company to understand the whole picture of the industry. Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 07, 2021, Shangpu Consulting received a satisfaction evaluation sheet from a customer for "A Brand in an Industry Leading Sales Research Project for Three Consecutive Years. The customer said: The survey plan of Shangpu Consulting is rigorous in design, scientific in method, standardized and rigorous in survey organization process, and basically reliable survey data, which provides relatively credible first-hand information for our research work. The research results are of great help to our company to understand the whole picture of the industry. Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 07, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the "China Bird's Nest Industry Market Ranking Research Project. The customer said: has cooperated many times, as always satisfied, also recommend to other enterprises cooperation. Once again, I would like to thank the users for their support and wish them a prosperous career and an evergreen foundation!

  • On July 09, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for a hazardous waste treatment research project. The customer said: this is the organizational structure of the survey, the service process is very good, wish your company's consulting work is getting better and better, look forward to the next cooperation. I wish users a prosperous career, evergreen foundation!

  • On July 16, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the Shared Beauty Research Project. The customer said: The content of Phase I and Phase II is satisfactory, and we look forward to signing a long-term agreement in the later period. The research part of the country will continue to cooperate with your company. I wish users a prosperous career, evergreen foundation!

  • On July 09, 2021, Shangpu Consulting received a satisfaction evaluation sheet from the customer for the "Research Project on the Organizational Structure of Two Liquor Production Enterprises. The customer said: This is an organizational structure survey, the service process is very good, looking forward to the next cooperation. I wish users a prosperous career, evergreen foundation!

Shangpu Consulting In the field of consulting, we can also provide you with the following services:
Research Module research content
Market research Industry status market capacity Product Application channel mode Supply chain market competition Market Consulting
Competitor Research Enterprise background Enterprise Finance Sales Data Market Strategy Production Equipment Supply Procurement Technology R & D
warehousing logistics channel construction Human Resources Enterprise Strategy      
User Research Consumer Survey consumption behavior attitude Publicity/Promotion Product Service Brand Research consumer characteristics
satisfaction survey Employee satisfaction user satisfaction        
Market Entry Advisory Macro Industry Research competitive enterprise research Downstream User Research Channel Research Due Diligence Return on Investment
Floor module Landing implementation recommendations Long-term cooperation        
Business investment due diligence Target industry market investment value due diligence Industry Benchmarking Enterprise Research Target Enterprise Credit Assessment Report Project investment due diligence    
industry planning Market research market access development strategy investment location Acquisition and integration IPO Fundraising
Credit Report Basic information Major Events Production/Operation Network enterprise scale Operating strength Financial strength Legal risk
Future business prediction Overall credit rating cooperative risk warning        
Brand/Sales Proof Market Share Proof Market Share Proof Proof of brand strength Industry Proof Specialized new proof Proof of sales strength Proof of technological leadership
National/Global Status Certificate            
Service advantages
More than 20 years of focus on the Chinese market consulting, won the user recognition, user satisfaction reached more than 96%, the following is part of the user praise
  • Focus on production and research

    15 Year

    15 years of Shangpu consulting

    48 Intellectual Property Rights

    Independent methodology

    80% of the information comes from first-hand research.

  • massive data

    118 Billionth

    Self-built database 11.8 billion

    Covering 1978 industries in China

    0.1 billion new data per year

    Industry Big Data Platform

  • Research Team

    118 +

    Have a 300 team of professional consultants

    Practical operation and management experience of top enterprises

    88% of members have international PMP certificates

  • Intellectual Property

    48 Item

    Independent methodology

    48 independent intellectual property rights

    high-tech enterprise

    Industry Big Data Platform

Customer Evaluation
More than 20 years of focus on the Chinese market consulting, won the user recognition, user satisfaction reached more than 96%, the following is part of the user praise

For detailed cases, please contact the consultant.

400-969-2866

One-to-one service for free consultants

Please leave your phone number and one of our consultants will contact you directly within 10 minutes (working hours).