AI-Integrated Big Data Processing Pipelines for Immediate Intelligence

Sridhar Kakulavaram

doi:10.55124/ijccscm.v1i2.243

Articles

DOI: 10.55124/ijccscm.v1i2.243

Published: 2025-03-25

Enterprise Data Architect and Information Technology Project Manager

International Journal of Cloud Computing and Supply Chain Management

ISSN 3067-0535

Download PDF

AI-Integrated Big Data Processing Pipelines for Immediate Intelligence

Authors

Sridhar Kakulavaram Enterprise Data Architect and Information Technology Project Manager

Keywords

AI Analytics, Stream Processing, Data Pipelines, Cloud Computing

Abstract

Introduction: This study explores Incorporating AI-driven stream processing into contemporary data pipelines to address the challenges of real-time analytics. By using frameworks such as Apache Kafka and Tensor Flow Extended, the research highlights architectural improvements that improve latency management, scalability, and operational efficiency across many fields such as asset management, healthcare and finance. Research Significance: The significance of this study lies in its practical approach to leveraging AI-enhanced data pipelines for real-time applications. By addressing inefficiencies in legacy architectures and exploring scalable, adaptive solutions, this work contributes to the growing landscape of AI infrastructure, ensuring improved decision-making, cost-efficiency, and competitive advantage in data-driven industries.SPSS statistics: SPSS (Statistical Package for the Social Sciences) is a widely used statistical software for data analysis. It is designed for researchers, analysts, and businesses to analyze data, visualize results, and perform statistical tests efficiently.Input Parameters: Data Source, Data Format, Ingestion Method, Processing Engine, Latency Requirement, AI Model Type.Evaluation Parameters: Eval Accuracy, Eval Throughput, Eval Scalability, Eval Cost Efficiency, Eval Response Time.

1. INTRODUCTION

The rapid growth of digital applications and connected devices has led to a huge increase in data generation, requiring efficient real-time processing systems to extract valuable insights. Traditional batch processing fails to handle fast-moving data, leading to the emergence of AI-driven stream processing. By combining machine learning with real-time data pipelines, organizations can improve decision-making, automatically detect anomalies, and better utilize resources [1]. [2] Various technologies, such as Apache Kafka, Apache Flink, and TensorFlow Extended, have become key frameworks for supporting real-time machine learning workflows. These tools provide important features such as distributed computing, event-driven processing, and model deployment, facilitating smooth AI integration into dynamic systems. However, effective implementation requires thoughtful system design to maintain performance, scalability, and reliability. This study explores how to optimize real-time machine learning pipelines, emphasizing architectural improvements and best practices for managing continuous data streams. [3] Legacy data pipelines They often face great difficulties in managing the complexity and velocity of contemporary data flows. These limitations lead to fragmented data silos, slow processing times, and inconsistent data quality, which ultimately affect the accuracy and reliability of AI systems.

To address these issues, this paper presents an elastic data pipeline architecture designed to effectively support the high-performance demands of AI workloads. [4] This study conducts an in-depth analysis of the key mechanisms and components behind AI-driven analytics to highlight its transformative role in Organizational decision-making. Organizations benefit from using AI analytics increase operational efficiency, foster innovation, and advance strategic growth initiatives. [5] Data pipelines play a key role in integrating data movement, transformation, and storage within today’s complex cloud computing ecosystems. In cloud environments, they integrate components such as storage solutions, computing power, and analytics tools to process vast amounts of data. As cloud technologies evolve, organizations have developed pipelines capable of handling complex workloads, enabling timely insights and more informed decision-making. [6] This system was developed, Famong other things, they have a major impact on developing appropriate aggregation techniques to evaluate enormous data, such as counterfactual datasets generated by AI. Due to the increasing complexity and size of contemporary models, MATLAB-based models are becoming less suitable tasks, the rise of large language models and other AI advances necessitates a rethinking of traditional concepts, adapting them to the rapidly evolving AI landscape. [7] Processing latency refers to the delays caused by the different stages of the data pipeline performing tasks such as transformation, aggregation, and analysis. Each step, such as data cleaning, enrichment, and analysis, contributes to the overall latency, which can accumulate significantly. To reduce this latency, workflows and algorithms need to be optimized to ensure fast execution at each stage. Techniques such as parallelism and batch processing are often used to speed up data manipulation and improve overall pipeline performance.The demand for scalable and efficient solutions in data processing is rapidly increasing, particularly in cloud-based systems. Conventional graph processing architectures often find it difficult to handle the demands of real-time decision-making and dynamic big data. This study presents a new AI-driven framework that integrates Graph Neural Networks and Reinforcement Learning into cloud-native graph processing. The proposed system enhances scalability, adaptability, and efficiency, significantly improving throughput and resource utilization in real-time analytics [8].The paper introduces the AI-Enhanced Cloud Data Pipeline framework, designed to address the challenges of traditional ETL pipelines in modern data processing environments. It It optimizes real-time data by combining adaptive resource management with deep learning-based stream processing handling and improve processing speed and adaptability [9]. The introduction emphasizes that data pipelines are essential for Facilitates organizational optimization and intelligent decision-making. They facilitate the integration of diverse data sources, which is essential for efficient operations in today's data-driven world. It draws attention to the revolutionary effects of technologies such as artificial intelligence, big data analytics, and the Internet of Things.

These technologies are changing how businesses operate by providing the means to effectively evaluate and use data. The introduction explains how businesses can gain critical insights into consumer behavior, market trends, and operational efficiency by connecting diverse data sources using data pipelines.This integration is vital for making informed decisions that enhance productivity and competitiveness [10]. [11] The authors propose integrating real-time data pipelines into property management systems. This integration aims to optimize workflows and enhance the overall effectiveness of property administration. The focus is particularly on using AI-guided software for automated scheduling, which can significantly reduce manual labor and errors. Benefits of AI Integration: The introduction outlines the advantages of incorporating AI applications, such as automating appointment planning. In addition to saving time, it also increases interactions with tenants, leading to a more satisfactory experience for all parties involved. [12] This study provides a methodology that integrates AI and data warehousing to provide healthcare providers with better analytical capabilities by addressing current challenges in healthcare data management. This paper demonstrates how AI-enabled data pipelines impact decision-making, patient outcomes, and operational efficiency using real-world case studies. Adoption of AI-enabled solutions in healthcare is essential to gain rapid, data-driven insights and improve overall healthcare delivery due to the growing volumes and complexity of data. [13] Big data and artificial intelligence are essential for generating real-time insights that facilitate complex organizational decision-making. By integrating big data sources with AI algorithms, there are vast opportunities to derive actionable insights with previously unheard-of speed and accuracy. This article explores how supply chain management, finance, and healthcare sectors can more easily handle complex, dynamic work by combining artificial intelligence and big data.

2. MATERIALS & METHODS

Input Parameters :

Data Source: A data source is where information is created or stored, such as databases, IoT devices, APIs, or logs. It defines the starting point of the data flow and greatly affects the relevance, quality, and reliability of the data used in AI systems or analytics processes.

Data Format: A data format specifies how data is structured and represented—structured (CSV, JSON, XML) or unstructured (text, video). It determines ease of parsing, compatibility with tools, and processing requirements. Proper design ensures efficient ingestion, interpretation, and model training in AI pipelines and data processing architectures.

Ingestion Method: The ingestion method refers to how data is collected and transferred to the processing environment. Common methods include batch processing, real-time streaming, or micro-batching. The method chosen affects data freshness, system complexity, and performance, directly impacts downstream analytics, storage results, and AI model responsiveness.

Processing Engine: A processing engine is a framework or system used to analyze and manipulate data. Examples include Apache Spark, Hadoop, and Flink. It performs transformations, aggregations, or model training. The engine's performance, scalability, and fault tolerance directly affect the efficiency and effectiveness of AI and data pipelines.

Latency Requirement: The latency requirement defines the maximum acceptable time delay between data input and decision output. Applications such as autonomous driving or financial trading demand low latency. Setting this parameter affects the system architecture, processing tools, and network design to ensure timely and responsive decision making in real-time or near-real-time settings.

AI Model Type: The AI model type refers to the specific algorithmic approach chosen to solve a problem—e.g., regression, classification, clustering, or deep learning. The model type is selected based on the nature of the data and objectives, training time, interpretability, accuracy, and suitability for production deployment.

Evaluation Parameters:

Eval Accuracy: Eval accuracy measures how closely an AI model's predictions match the actual outcomes. It is an important performance metric in supervised learning. High accuracy indicates that the model understands patterns well, but it must be balanced with considerations of robustness, generalizability, and overfitting or bias in the training data.

Eval Performance: Performance refers to the amount of data or number of operations that a system can handle within a given time. High performance is essential for efficiently processing large datasets. It reflects the system's ability to process large datasets. It is affected by the design of the processing engine, hardware resources, and workload distribution strategies.

Eval Scalability: Scalability measures the ability of a system to maintain or improve performance when handling increasing data volumes or user requests. A scalable system can be scaled horizontally or vertically with minimal degradation. This is critical for ensuring consistent performance growth as data ecosystems and processing requirements evolve.

Eval Cost Efficiency: Cost efficiency evaluates the performance benefits of a system relative to its operational costs. It balances computing power, memory usage, storage requirements, and energy consumption against output. Achieving cost efficiency ensures optimal resource utilization, especially in cloud-based applications or large-scale AI applications with budget constraints.

Response Time Degradation: Response time is the amount of time between a user or system request and the delivery of the output. Short response times are critical in user-facing applications and real-time analytics. It depends on data size, system architecture, model complexity, and latency, which directly impacts user satisfaction and system usability.

SPSS Method:SAS and SPSS both open a dataset then run regressions, but SPSS executes three separate regression models in distinct runs, producing three bundled outputs. SAS, however, specifies all three models within one procedure, executing a single regression. Both platforms then apply the Sobel test to compute p-values, mediation percentages, and effect ratios. [2] SPSS offers add-on modules like Complex Samples and Advanced Models; this review focuses on Missing Value Analysis (MVA). Although MVA is popular—especially given the rise of multiple imputation—it lacks support for many leading techniques and is often viewed as a second‐best choice. Consequently, its implemented methods exhibit biases and limitations. [3] Descriptive statistics summarize variables via measures such as mean, median, mode, standard deviation, range, and IQR. Researchers visualize distributions using histograms, stem-and-leaf plots, or box plots. Statistical analyses often assume normal distribution, yet failure to empirically test this assumption can undermine validity and reliability, potentially compromising resulting conclusions. [4] Social science enriches critical thinking by exploring human behavior, social facts, and uncertainty—areas often overlooked when emphasis centers solely on engineering.

Disciplines like religion, art, and music foster a well-rounded education. In management and related fields, social sciences play an essential role, encouraging broader perspectives and questioning of established facts. [5] RCA regression treats observations independently, avoiding within-subject design violations by assuming each predictor–criterion relationship is linear, continuous, and bivariate. It offers forecasting capabilities and serves as a flexible alternative to ANOVA. RCA is applied across domains such as reading, emotion, cognitive control, and numeracy, leveraging standard regression to handle diverse research contexts. [6] Social sciences encompass Business, Management, Humanities, Arts, Political Science, and Education, relying on experimental and scientific measurements. Departments employ both quantitative and qualitative tools to generalize findings—particularly in large educational populations. Researchers must navigate tool selection carefully, as analytic accuracy critically impacts research outcomes. This thesis aims to review common methods and recommend best practices. [7] These notes outline step-by-step SPSS procedures, offering guidance on execution and interpretation, alongside practical tips to overcome challenges. Tailored to graduate students with basic statistical knowledge, each section explains analyses—such as logistic regression—so even novices can understand objectives, execute steps, and interpret results, using the authors’ research experience in social science contexts. [8] SPSS for Windows features a menu-driven interface with various window types—data view, output, and syntax. Users select options from pull-down menus, enter raw data manually, or open existing files. This guide demonstrates data entry and file selection, emphasizing SPSS’s intuitive design for statistical operations through active-window interactions and straightforward navigation. [9] When a single team employs unity, it mirrors the exact solution space of eigenvectors or singular value decomposition. Incorporating multiple metrics introduces numerical complexity, resolved by an iterative algorithm that finds orthogonal vector dimensions.

This process minimizes pairwise differences and ensures gradual spatial adjustments to locate target objects within a multidimensional space. [10] Statistical calculations involve numerous formulas and procedures that are difficult to memorize. SPSS simplifies this by automating processing: users input data and the software handles computations. Particularly beneficial for college-level statistics students, SPSS makes analyses more engaging and accessible—facilitating guided discovery methods that improve learning outcomes and reduce reliance on manual formula recall. [11] Developed by IBM, SPSS excels in basic and advanced statistical analyses across industries like banking, defense, and academia. It supports factor analysis, a multivariate technique from educational psychology that reduces variable complexity and reveals underlying factors. Widely adopted in fields such as psychology, medical science, and economics, factor analysis simplifies structures and enhances interpretability. [12] Mediation analysis examines direct and indirect effects to understand variable relationships. Despite frequent mediation hypotheses, systematic testing of indirect effects is rare. This overview underscores the importance of testing for significant indirect paths, providing SPSS and SAS macros for normal-theory and bootstrap methods to estimate confidence intervals—following Baron and Kenny’s (1986) approach. [13] Siblings’ number, age, gender, and birth order influence socialization, health, and psychological development. Interfamilial conflicts—often arising from age differences—can cause power struggles, rivalry, and jealousy, disrupting children’s growth. Peaceful conflict resolution is crucial for a nurturing home environment, which underpins psychological resilience, particularly affecting women’s endurance and shaping educational and career outcomes. [14] SPSS is widely used in social sciences, government, health inspection, marketing, and data mining. The original SPSS Handbook greatly influenced sociology. Researchers leverage SPSS for native analyses and data management tasks—such as case selection, reformatting, and deriving variables—while metadata dictionaries document datasets. Universities, especially psychology departments, adopt SPSS to teach and facilitate statistical procedures. Despite its empirical relevance, Generalizability Theory (G Theory) is underutilized due to limited software support in popular packages. Classical experiments dominate literature, while Item Response Theory remains common. This article details G Theory analyses and offers straightforward procedures using SPSS, SAS, and MATLAB, aiming to expand adoption by providing accessible implementation guidance [15].

2. RESULT AND DISCUSSION

TABLE 1.Reliability Statistics


Reliability Statistics
Cronbach's Alpha	Cronbach's Alpha Based on Standardized Items	N of Items
.860	.858	5

Table 1 presents the overall reliability statistics of the evaluation scale. The Cronbach's Alpha value is 0.860, and 0.858 based on standardized items, indicating high internal consistency among the five items measured. A Cronbach's Alpha above 0.8 is generally considered good, suggesting that the items reliably measure the same underlying construct. With five items included, this high reliability value confirms that the scale is suitable for further statistical analysis, and the responses are consistent across the variables. This supports the overall validity of the evaluation framework used in the study.

TABLE 2. Reliability Statistic individual


	Cronbach's Alpha if Item Deleted
Eval Accuracy	0.799
Eval Throughput	0.826
Eval Scalability	0.826
Eval CostEfficiency	0.89
Eval ResponseTime	0.799

Table 2 displays the reliability statistics using Cronbach’s Alpha if each item is deleted. The overall internal consistency of the dataset is assessed, and values range from 0.799 to 0.89. Items such as Eval Cost Efficiency show the highest alpha if deleted (0.89), indicating that its removal would increase the overall reliability, suggesting it may not align well with the other items. Eval Accuracy and Eval Response Time have the lowest alpha values if deleted (0.799), showing they strongly contribute to the scale's reliability. The data suggests acceptable reliability, with potential improvement if Cost Efficiency is excluded.

TABLE 3. Descriptive Statistics


Descriptive Statistics
	N	Range	Minimum	Maximum	Mean		Std. Deviation	Variance	Skewness		Kurtosis
		Statistic	Statistic	Statistic	Statistic	Statistic	Std. Error	Statistic	Statistic	Statistic	Std. Error	Statistic	Std. Error
Eval Accuracy	10	2	3	5	4.2	0.249	0.789	0.622	-0.407	0.687	-1.074	1.334
Eval Throughput	10	2	3	5	4	0.258	0.816	0.667	0	0.687	-1.393	1.334
Eval Scalability	10	2	3	5	4.3	0.26	0.823	0.678	-0.687	0.687	-1.043	1.334
Eval CostEfficiency	10	2	3	5	3.9	0.233	0.738	0.544	0.166	0.687	-0.734	1.334
Eval ResponseTime	10	2	3	5	4.2	0.249	0.789	0.622	-0.407	0.687	-1.074	1.334
Valid N (listwise)	10

Table 3 presents descriptive statistics for five evaluation variables based on 10 responses. The mean scores range from 3.9 (Eval Cost Efficiency) to 4.3 (Eval Scalability), indicating generally positive evaluations. All variables have a range of 2, with minimum scores of 3 and maximum scores of 5. Standard deviations are relatively low (around 0.74–0.82), showing limited variability. Skewness values are close to zero, indicating near-symmetrical distributions, while kurtosis values are negative, suggesting flatter distributions than a normal curve. The data shows consistency with slight preference toward higher ratings, especially for Scalability and Response Time, and relatively low dispersion.

TABLE 4. Frequencies Statistics


Statistics
		Eval Accuracy	Eval Throughput	Eval Scalability	Eval CostEfficiency	Eval ResponseTime
N	Valid	10	10	10	10	10
N		Missing	0	0	0	0	0
Median		4	4	4.5	4	4
Mode		4^a	4	5	4	4^a
Percentiles	25	3.75	3	3.75	3	3.75
		50	4	4	4.5	4	4
		75	5	5	5	4.25	5

Table 4 presents descriptive frequency statistics for five evaluation variables. All variables have 10 valid responses with no missing data. The median values range from 4 to 4.5, indicating that most responses fall on the higher end of the scale. Eval Scalability has the highest median (4.5), suggesting better perceived performance. The mode for most variables is 4, except for Eval Scalability, which has a mode of 5, showing consensus on higher ratings. Percentile values show that 75% of the responses are 4 or above for all variables, reflecting generally favorable evaluations across all metrics with slight variation in Scalability.

Histogram Plot

FIGURE 1.Eval Accuracy

Figure 1 shows the distribution of evaluation accuracy across 10 samples. The histogram indicates most values fall between 3.5 and 5.5, with a mean accuracy of 4.2. The overlaid normal curve suggests a roughly normal distribution. The standard deviation is 0.709, indicating moderate variation in accuracy scores.

FIGURE 2. Eval Through put

Figure 2 illustrates the distribution of evaluation throughput for 10 samples. The histogram shows most values lie between 3.5 and 4.5, with a mean throughput of 4.0. The standard deviation is 0.816, indicating moderate variability. The normal curve overlay suggests the data is approximately normally distributed, supporting statistical analysis.

FIGURE 3.Eval Scalability

Figure 3 presents the distribution of evaluation scalability across 10 samples. The histogram indicates a concentration of values between 4.5 and 5.5, with a mean scalability score of 4.3. The standard deviation is 0.823, showing moderate variability. The overlaid normal curve suggests a near-normal distribution, supporting consistency in scalability evaluation.

FIGURE 4.Eval Cost Efficiency

Figure 4 displays the distribution of evaluation cost efficiency for 10 samples. The histogram shows that most values cluster around 3.5 to 4.5, with a mean of 3.9. The standard deviation is 0.739, indicating moderate variability. The bell-shaped normal curve suggests the data follows an approximately normal distribution pattern.

FIGURE 5.Eval Response Time

Figure 5 illustrates the distribution of evaluation response time for 10 samples. The histogram reveals a concentration of scores between 3.5 and 5.5, with a mean of 4.2. A standard deviation of 0.789 suggests moderate variability. The overlaid normal distribution curve indicates the data follows an approximately normal trend in response times.

Correlations

TABLE 5. Correlations


Correlations
	Eval Accuracy	Eval Throughput	Eval Scalability	Eval Cost Efficiency	Eval Response Time
Pearson Correlation	1	0.518	.753*	0.42	.821**
Eval Throughput	0.518	1	0.496	0.553	.690*
Eval Scalability	.753*	0.496	1	0.238	.753*
Eval Cost Efficiency	0.42	0.553	0.238	1	0.229
Eval Response Time	.821**	.690*	.753*	0.229	1

Table 5 presents the Pearson correlation coefficients among five evaluation variables. Eval Response Time shows strong positive correlations with Eval Accuracy (r = .821, *p < 0.01), Eval Throughput (r = .690, p < 0.05), and Eval Scalability (r = .753, p < 0.05), indicating significant relationships. Eval Accuracy is also significantly correlated with Eval Scalability (r = .753, *p < 0.05). However, Eval Cost Efficiency shows weak correlations with all other variables and no significant associations, suggesting it behaves independently. Response Time, Accuracy, and Scalability are strongly interrelated, while Cost Efficiency contributes less to the shared variance among variables.

Regression

TABLE 6. Model Summary


Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Change Statistics		Change Statistics
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate						R Square Change	F Change	df2	Sig. F Change
Eval Accuracy	.902^a	0.813	0.664	0.457	0.813	5.452	5	0.046	2.159
Eval Through put	.849^a	0.721	0.498	0.579	0.721	3.228	5	0.115	2.225
Eval Scalability	.790^a	0.624	0.323	0.677	0.624	2.076	5	0.222	2.702
Eval Cost Efficiency	.761^a	0.578	0.241	0.643	0.578	1.715	5	0.282	2.367
Eval Response Time	.928^a	0.861	0.751	0.394	0.861	7.77	5	0.023	2.005

Table 6 summarizes the regression model results for five evaluation variables. Eval Response Time shows the strongest model fit with an R of 0.928 and R² of 0.861, indicating that 86.1% of the variance is explained by the model, with a significant p-value of 0.023. Eval Accuracy also demonstrates a strong fit (R = 0.902, R² = 0.813, p = 0.046). Eval Throughput, Scalability, and Cost Efficiency show moderate to weak fits with lower R² values (0.721, 0.624, 0.578 respectively) and non-significant p-values (> 0.05). Thus, Accuracy and Response Time are significantly predicted, while others are less reliable in this model.

TABLE 7. ANOVA


ANOVA
Model	Sum of Squares	df	Mean Square	F	Sig.
Eval Accuracy	4.556	4	1.139	5.452	.046^b
Eval Through put	4.325	4	1.081	3.228	.115^b
Eval Scalability	3.807	4	0.952	2.076	.222^b
Eval Cost Efficiency	2.834	4	0.709	1.715	.282^b
Eval Response Time	4.824	4	1.206	7.77	.023^b

Table 7 presents the ANOVA results for five evaluation metrics. Eval Accuracy and Eval Response Time show statistically significant results with p-values of 0.046 and 0.023 respectively, indicating that differences among group means are significant at the 5% level. This suggests that these variables vary meaningfully across the tested conditions. In contrast, Eval Throughput (p = 0.115), Eval Scalability (p = 0.222), and Eval Cost Efficiency (p = 0.282) have higher p-values, indicating no statistically significant differences. Therefore, only Eval Accuracy and Response Time contribute significantly to the model, while the others show no notable variation across groups.

Factor Analysis

TABLE 8. Communalities


Communalities
	Initial	Extraction
Eval Accuracy	1	0.8
Eval Throughput	1	0.64
Eval Scalability	1	0.697
Eval CostEfficiency	1	0.295
Eval ResponseTime	1	0.82
Extraction Method: Principal Component Analysis.

Table 8 shows the communalities for each variable using Principal Component Analysis (PCA). The initial communalities are all 1, indicating that each variable initially contributes fully to the analysis. The extracted communalities show how much variance in each variable is explained by the retained component. Variables like Eval Accuracy (0.800), Eval Response Time (0.820), and Eval Scalability (0.697) have high extraction values, meaning they are well represented by the principal component. Eval Throughput is moderately represented (0.640), while Eval Cost Efficiency has the lowest value (0.295), indicating it is least explained by the extracted factor in the PCA model.

TABLE 9. Total Variance Explained


Total Variance Explained
Component	Initial Eigenvalues			Extraction Sums of Squared Loadings
Component		Total	% of Variance	Cumulative %	Total	% of Variance	Cumulative %
1	3.253	65.055	65.055	3.253	65.055	65.055
2	.956	19.129	84.184
3	.462	9.238	93.422
4	.256	5.126	98.548
5	.073	1.452	100.000

Table 9 presents the "Total Variance Explained" through Principal Component Analysis (PCA). With an eigenvalue of 3.253, the first component accounts for 65.055% of the variance, meaning it captures the majority of the dataset's information. Only this component is retained based on the extraction criteria, as its eigenvalue exceeds 1. The second to fifth components have much lower eigenvalues and contribute marginally to the variance, with cumulative values reaching 100%. However, since only the first component meets the criteria for retention, it alone is considered significant in explaining the underlying data structure.

4. CONCLUSION

The rapid expansion of digital technologies, connected devices, and complex applications has reshaped the data landscape, requiring agile and intelligent data processing solutions. Traditional block-based pipelines are increasingly inadequate to handle theexponential speed, volume, and diversity of contemporary data flows. This paper provides a comprehensive study of AI-driven stream processing architectures that integrate technologies such as Apache Flink, Apache Kafka, and TensorFlow Extended to meet these growing needs.One of the important contributions of this research is the proposed framework for elastic, cloud-native data pipelines that not only address the limitations of legacy systems but also provide scalable, fault-tolerant, and efficient solutions. These architectures use State-of-the-art machine learning methods such as reinforcement learning and graph neural networks, to intelligently manage resources, reduce latency, and improve performance. The integration of AI into these pipelines enables dynamic learning from live data, enabling real-time anomaly detection, predictive analytics, and automated decision-making.evaluation criteria such as responsiveness, cost-effectiveness, scalability, accuracy, and efficiency time were used to evaluate the performance of AI-enhanced pipelines. The results suggest that by using appropriate architectural design and optimization techniques such as parallel processing and adaptive resource management, performance can be significantly improved without compromising cost or scalability.In addition, the study emphasizes the strategic importance of AI-integrated pipelines in sectors such as healthcare, asset management, and supply chain systems. These industries greatly benefit from real-time insights, enabling proactive responses, and fostering innovation. The case studies in this research demonstrate tangible improvements in operational efficiency and user experience, underscoring the real-world value of these systems.Finally, this study supports the shift from rigid, one-size-fits-all data infrastructures to adaptive, intelligent systems that are capable of evolving with technological advances. As AI continues to mature and data environments grow increasingly complex, the findings and frameworks presented here provide a valuable foundation for future innovations in data pipeline engineering. Organizations that adopt such intelligent systems will be better positioned to leverage their data assets for strategic growth, resilience, and sustained competitive advantage.

REFERENCE

Abbas, Tahir, and Addison Eldred. "AI-Powered Stream Processing: Bridging Real-Time Data Pipelines with Advanced Machine Learning Techniques." ResearchGate Journal of AI & Cloud Analytics (2025).
Nasir, Waseem, and Halim Jack. "Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications." ResearchGate AI & Data Science Journal (2025).
Agarwal, Giriraj. "Robust Data Pipelines for AI Workloads: Architectures, Challenges, and Future Directions." International Journal of Advanced Research in Science, Communication and Technology 5, no. 2 (2024): 622-632.
Althati, Chandrashekar, Jesu NarkarunaiArasuMalaiyappan, and LavanyaShanmugam. "AI-Driven Analytics: Transforming Data Platforms for Real-Time Decision Making." Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023 3, no. 1 (2024): 392-402.
Sresth, Vishal, Sudarshan Prasad Nagavalli, and Sundar Tiwari. "Optimizing Data Pipelines in Advanced Cloud Computing: Innovative Approaches to Large-Scale Data Processing, Analytics, and Real-Time Optimization." INTERNATIONAL JOURNAL OF RESEARCH AND ANALYTICAL REVIEWS 10 (2023): 478-496.
Kumar, Yulia, Jose Marchena, Ardalan H. Awlla, J. Jenny Li, and HemnBarzanAbdalla. "The AI-Powered Evolution of Big Data." Applied Sciences 14, no. 22 (2024): 10176.
Iqbal, Usman. "AI Techniques for Enhancing Latency Reduction in Distributed Data Pipeline Systems." Aitoz Multidisciplinary Review 2, no. 1 (2023): 216-223.
Malikireddy, S. K. R. (2022). AI-Powered Cloud-Native Graph Processing for Real-Time Decision-Making in Big Data Pipelines. Indian Scientific Journal of Research in Engineering and Management, 06(07), 1–6. https://doi.org/10.55041/ijsrem15412.
Dachepalli. V, “OPTIMIZED CLOUD SECURITY ECC-ENHANCED HOMOMORPHIC PAILLIER RE-ENCRYPTION” International Journal of Interpreting Enigma Engineers (IJIEE)., 2024, vol. 1, no. 2, pp. 1–7. doi: https://doi.org/10.62674/ijiee.2024.v1i02.001
Kolluri, S. S. (2024). Automating Data Pipelines with AI for Scalable, Real-Time Process Optimization in the Cloud. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 10(6), 2070–2079. https://doi.org/10.32628/cseit242612405.
Enabling Intelligent Decision Making and Optimization in Enterprises through Data Pipelines. (2024). Journal of Social Science and Humanities. https://doi.org/10.53469/wjimt.2024.07(02).13
Kommanaboina, K. Y. (2022). Real-Time Data Pipelines for Enhancing Property Management Systems: Integrating AI Bots for Automated Scheduling. Design of Single Chip Microcomputer Control System for Stepping Motor, 1–5. https://doi.org/10.47363/jaicc/2022(1)e134.
Seethala, S. C. (2020). AI-Enabled Data Pipelines: Modernizing Data Warehouses in Healthcare for Real-Time Analytics. International Research Journal of Innovations in Engineering and Technology, 04(12), 43–45. https://doi.org/10.47001/irjiet/2020.412007.
Panyaram, S. (2024). Integrating Artificial Intelligence with Big Data for Real-Time Insights and Decision-Making in Complex Systems. 1(2), 85–95. https://doi.org/10.69888/ftsin.2024.000211.
Choudhury, MusfiqMannan. "A study of the significant factors affecting trust in electronic commerce." PhD diss., Durham University, 2008.
Dudley, William N., Jose G. Benuzillo, and Mineh S. Carrico. "SPSS and SAS programming for the testing of mediation models." Nursing research 53, no. 1 (2004): 59-62.
Von Hippel, Paul T. "Biases in SPSS 12.0 missing value analysis." The American Statistician 58, no. 2 (2004): 160-164.
Park, Hun Myoung. "Univariate analysis and normality test using SAS, Stata, and SPSS." (2015).
SINGH, OINAM BHOPEN. "SPSS: A Research Method for Social Sciences and Management." IUN Journal of (2015).
Pfister, Roland, Katharina Schwarz, Robyn Carson, and Markus Jancyzk. "Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel." Tutorials in Quantitative Methods for Psychology 9, no. 2 (2013): 72-78.
Ong, Mohd Hanafi Azman, and FadilahPuteh. "Quantitative data analysis: Choosing between SPSS, PLS, and AMOS in social science research." International Interdisciplinary Journal of Scientific Research 3, no. 1 (2017): 14-25.
Babbie, Earl, Fred Halley, and Jeanne Zaino. Adventures in social research: data analysis using SPSS 14.0 and 15.0 for Windows. Pine Forge Press, 2007.
Bell, Anthony A. Social Desirability and Other Predictors of Statistics Anxiety at the Graduate Level. Capella University, 2022.
Giguère, Gyslain. "Collecting and analyzing data in multidimensional scaling experiments: A guide for psychologists using SPSS." Tutorials in Quantitative Methods for Psychology 2, no. 1 (2006): 27-38.
Sulistyo, L., and N. K. Dwidayati. "Active learning with SPSS assisted guided discovery learning method to improve student’s statistical learning achievement." In Journal of Physics: Conference Series, vol. 1808, no. 1, p. 012042. IOP Publishing, 2021.
Chen, Hongming, and Xiaocan Xiao. "The Application of SPSS Factor Analysis in the Evaluation of Corporate Social Responsibility." J. Softw. 7, no. 6 (2012): 1258-1264.
Preacher, Kristopher J., and Andrew F. Hayes. "SPSS and SAS procedures for estimating indirect effects in simple mediation models." Behavior research methods, instruments, & computers 36 (2004): 717-731.
Kafle, Sarad Chandra. "Correlation and regression analysis using SPSS." Management, Technology & Social Sciences 126 (2019).
Jatnika, Ratna. "The effect of SPSS course to student’s attitudes toward statistics and achievement in statistics." International Journal of Information and Education Technology 5, no. 11 (2015): 818.

Make a Submission

Information

Current Issue

Browse

Published

2025-03-25

How to Cite

Kakulavaram, S. (2025). AI-Integrated Big Data Processing Pipelines for Immediate Intelligence. International Journal of Cloud Computing and Supply Chain Management, 1(2), 1-8. https://doi.org/10.55124/ijccscm.v1i2.243

Download Citation

Issue

Vol. 1 No. 2 (2025): International Journal of Cloud Computing and Supply Chain Management

Section

Articles

ISSN 3067-0535

AI-Integrated Big Data Processing Pipelines for Immediate Intelligence

Authors

Keywords

Abstract

Make a Submission

Information

Current Issue

Browse

Published

How to Cite

Issue

Section

Navigate

Digital Indexing

Crossref

Metadata

ISSN

Index

Google Scholar

Index

Contact Us

ISSN 3067-0535

AI-Integrated Big Data Processing Pipelines for Immediate Intelligence

Authors

Keywords

Abstract

Make a Submission

Information

Current Issue

Browse

Published

How to Cite

Issue

Section

Latest Updates Subscribe To Our Newsletter

Crossref

Metadata

ISSN

Index

Google Scholar

Index