MSR 2024

PAPER SUMMARY

Paper presentation at the MSR 2024 conference - Lisbon, Portugal

In this study, we conduct a comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92 non-ML projects). Our investigation comprises both quantitative and qualitative dimensions, aiming to uncover differences in CI adoption between ML and non-ML projects.

In particular, we address the following research questions:

RQ1: To what extent do CI practices adoption differ between ML and non-ML projects?

Motivation. Adopting CI is beyond the sole implementation of a CI service, requiring effective use of recommended CI practices. As such, the intricate nature of ML, involving complex algorithms and data, might present unique challenges in adopting CI practices effectively. For example, it might be harder to maintain shorter build duration in ML projects because they require more complex tests. Given the significant knowledge gap regarding the specific application of CI practices within ML projects, we start our investigations by comparing CI metrics from ML and non-ML projects.

RQ2: What are the evolution trends of build duration and test coverage within ML and non-ML projects?

Motivation. The results of RQ1 show that ML projects tend to require longer build duration compared to non-ML projects and have less test coverage in the case of medium-sized projects. As such, in RQ2, we analyze how the build duration and test coverage of ML and non-ML projects evolve over time. This analysis allows us to understand better the specificities of the differences observed in RQ1. For example, do ML projects have an increasing trend of longer build duration? Or has it always been the case since the beginning of the ML projects?

RQ3. What do ML and non-ML developers discuss about CI in their projects?

Motivation. Considering the notable differences observed between ML and non-ML projects concerning CI in RQ1 and RQ2, we delve deeper in RQ3 to better understand the discussions about CI in both ML and non-ML projects. Our goal is to investigate whether ML and non-ML projects differ in their discussions concerning the use of CI. For instance, ML projects may discuss more build issues, as we observed that they typically require a longer build duration (see RQ1).

Full Paper Preview and Download

Results

Here we present the results for each RQ that we address.

RQ1. To what extent do CI practices adoption differ between ML and non-ML projects?

RESULT SUMMARY

ML projects tend to require a longer build duration. In addition, medium-sized ML projects tend to have a lower test coverage.

RQ1 RESULTS' DETAILS

The figures below show the Boxplots of the distrubutions and the statistical comparisons of the CI metrics "buil duration", "test coverage", "time to fix broken builds", and "commits activity" between ML and non-ML projects.

Figure 1. Build Duration per project category and size.

Figure 2. Build Duration of medium-sized projects per programming language type.

Figure 3. Test Coverage per project category and size.

Figure 4. Commit Activity per project category and size.

Figure 5. Time to Fix Broken Builds per project category and size.

RQ2. What are the evolution trends of build duration and test coverage within ML and non-ML projects?

RESULT SUMMARY

Small and medium-sized ML projects manifest higher increasing build duration trends (75% and 61.4%) compared to non-ML projects (35.7% and 44.7%). Furthermore, both ML and non-ML projects manifest a maintaining test coverage trend, even with 46% of the medium ML projects exhibiting a median coverage rate below 75%.

RQ2 RESULTS' DETAILS

The figures below show the time series clusters and their trends for the build duration and test coverage metrics on ML and non-ML projects.

Figure 6. Build Duration Clustering Trends’ Patterns in ML Projects.

Figure 7. Build Duration Clustering Trends’ Patterns in non- 771 772 ML Projects.

Figure 8. Clustering trends of "Build Duration" time series per project category and size.

Figure 9. Test Coverage Clustering Trends’ Patterns in ML Projects.

Figure 10. Test Coverage Clustering Trends’ Patterns in non-ML Projects.

RQ3. What do ML and non-ML developers discuss about CI in their projects?

RESULT SUMMARY

Both ML and non-ML projects share common discussions on CI Build Execution and Status, CI Infrastructure, CI Pipeline Configuration, and CI Testing and Code Quality. However, ML projects exhibit a more extensive range of CI-related themes (74 themes) compared to non-ML projects (24 themes). Notably, a significant difference arises in the prevalence of the "relatedness of failures" theme in ML project discussions, indicating a potential higher incidence of false positives in their CI systems.

RQ3 RESULTS' DETAILS

Figures 11 and 12 present the CI discussions (i.e., the themes) that emerged from our document analysis.

Figure 11. CI-related themes discussed in ML projects.

Figure 12. CI-related themes discussed in non-ML projects.

Table 1 presents the description of the general themes related to CI discussions that we found in the context of ML and non-ML projects.

theme	description	context
CI Build Execution and Status	Discussions about managing and monitoring the execution and status of CI builds.	non-ML/ML
CI Infrastructure	Discussions about the underlying infrastructure supporting CI processes.	non-ML/ML
CI Pipeline Configuration	Discussions about configuration, structure, and optimization of the CI pipeline.	non-ML/ML
CI Testing and Code Quality	Discussions about aspects related to testing practices and code quality within the context of CI.	non-ML/ML
CI on Software Development Process	Discussions about the usage and impact of CI within the software development process.	ML

Table 1. General themes related to CI discussions in ML and non-ML projects.

Table 2 presents the description of codes related to CI discussions that we found in the context of ML and non-ML projects.

code	description	context
CI status tracking	Tracking the status of CI builds.	non-ML/ML
CI debugging	Identifying and resolving issues or bugs in the CI process.	non-ML/ML
relatedness of failures	Identifying that the build failure is not related to the proposed code change.	non-ML/ML
CI triggering	Methods and events that trigger the initiation of CI builds.	non-ML/ML
CI configuration	Discussing the configurations of the parameters and settings for CI processes.	non-ML/ML
fix broken CI	Addressing issues that cause the CI breakage.	non-ML/ML
CI local reproduction	Recreating or investigating CI failures on a local development environment.	non-ML/ML
test addition	Discussing the addition of new tests to the CI test suite.	non-ML/ML
dependency management	Managing dependencies and ensuring they are correctly handled in the CI pipeline.	non-ML/ML
CI as decision-making	Discussing the usage of CI results as a basis for decision-making in the development process.	ML
CI build duration	Discussing the the time taken for CI builds to be completed.	non-ML/ML
CI rebase	Handling code rebasing within the context of CI.	ML
linting	Discussing code style and quality through linting tools in the CI process.	non-ML/ML
test coverage	Discussing test coverage in the CI test suite.	non-ML/ML
CI flakiness	Dealing with flaky or inconsistent behavior in CI builds.	non-ML/ML
CI clarification	Providing clarifications and additional information related to CI processes.	ML
CI regression	Managing regressions in the CI pipeline.	ML
test setup	Discussing the configuration of the environment for running tests in the CI process.	non-ML/ML
CI channel	Discussions related to installations and configuration of CI systems.	ML
test refactoring	Restructuring or improving tests in the CI test suite.	ML
test requirements	Managing the requirements for tests in the CI pipeline.	ML
(mis)trust in CI	Mistrust in the results produced by CI builds.	ML
CI explanation	Providing explanations for CI-related processes and decisions.	ML
CI infrastructure	Discussing the infrastructure supporting CI.	ML
CI as facilitator	Captures conversations centered on streamlining CI processes by integrating external tools or services effectively.	ML
CI as quality-gate	Treating CI as a quality gate to ensure high standards in code.	ML
CI backporting	Backporting changes or fixes identified in the CI process to previous versions.	ML
CI caching	Managing caching strategies to improve CI build performance.	non-ML/ML
CI load	Managing the load on CI infrastructure and resources.	ML
CI merge conflicts	Discussing merge conflicts during the CI process.	ML
CI static analysis	Using static analysis tools to assess code quality in the CI pipeline.	ML
skipping CI	Discussing conditions under which CI builds can be skipped or ignored.	non-ML/ML
test instructions	Providing instructions for running tests in the CI environment.	ML
test update	Discussing test update in the CI test suite.	non-ML/ML
backwards compatibility	Managing backward compatibility of code changes in the CI process.	ML
CI build frequency	Discussing the frequency of continuous integration builds.	ML
CI correctness	Assessing the correctness of CI build results.	ML
CI double-check	Discussing additional checks in the CI build results.	ML
CI permissions	Discussing permissions to CI workflows management.	ML
CI policy	Establishing policies governing the CI process.	ML
CI security	Discussing security concerns in the context of CI.	ML
CI testability	Discussing how easily testable code changes are in the CI environment.	ML
integration delay	Discussing the influence of CI on the time reviewers take to integrate code changes in the mainstream.	ML
multiple jobs	Managing multiple jobs within the CI pipeline.	ML
pipeline setting	Configuring settings for the CI pipeline.	ML
test fixture	Setting up fixtures for tests in the CI testing suite.	ML
test guidance	Providing guidance for writing tests in CI.	ML
test readability	Discussing the readability and clarity of test cases in the CI testing suite.	ML
build system prototype	Discussing the development of prototypes for the build system in CI.	ML
CI build complexity	Discussing the complexity in the CI build process.	ML
CI build size	Discussing the size of the CI build process.	ML
CI credits	Discussing the usage of CI credits to execute the CI workflow within the CI service.	ML
CI effectiveness	Discussing and evaluating the overall effectiveness of the CI pipeline.	ML
CI stability	Discussing stability and reliability in the CI process.	ML
CI steps	Defining and organizing the steps in the CI pipeline.	ML
ease of installation	Discussing the ease of installation and setup of the CI environment.	ML
lack of tests	Discussing the lack of tests in the CI pipeline.	ML
learning CI	Discussing how the CI workflow steps work.	ML
memory management	Discussing memory usage in the context of CI builds.	non-ML/ML
model training jobs	Managing jobs related to training machine learning models in the CI pipeline.	ML
multi-pipelines	Discussing the coordination of multiple pipelines within the CI system.	ML
repository tagging	Discussing the management of version tagging in the repository within CI.	ML
review delay frustration with green CI	Dealing with frustration related to delays in code reviews despite a green CI.	ML
scheduled build	Discussing the scheduling of builds in the CI process.	ML
test discrepancies	Discussing discrepancies and inconsistencies in test results within the CI test suite.	ML
test freeze	Managing and dealing with test freezes in the CI test suite.	ML
test interaction	Handling interactions and dependencies between tests in the CI process.	ML
test issues	Discussing issues related to tests in the CI suite.	ML
test parametrization	Discussing configuration and managing parameters for tests in the CI process.	ML
test removal	Removing and managing tests in the CI suite.	ML
testing purposes	Discussing the purpose and goals of testing within the CI process.	ML
testing scope	Discussing the scope of testing within the CI pipeline.	ML
workflow permissions	Managing permissions and access control within the workflow of the CI process.	ML
CI is stuck	Discussion issues where the CI process is stuck.	non-ML
test structure	Organizing and structuring the test suite for better clarity and maintenance.	non-ML
CI service churn	Discussing the churn of the CI services being used.	non-ML
test guidelines	Discussing about guidelines for writing tests in the CI pipeline.	non-ML
test size	Discussing the test size within the CI test suite.	non-ML

Table 2. Description of codes related to CI discussions in ML and non-ML projects.

Category	fullName	defaultBranch	language
ML	Acellera/moleculekit	main	Python
ML	amark/gun	master	JavaScript
ML	BLKSerene/Wordless	main	Python
ML	chakki-works/doccano	master	Python
ML	coin-or/Gravity	master	C++
ML	criteo/tf-yarn	master	Python
ML	CyberReboot/poseidon	main	Python
ML	DandyDev/slack-machine	main	Python
ML	dbiir/UER-py	master	Python
ML	deepchem/deepchem	master	Python
ML	deepfakes/faceswap	master	Python
ML	diffgram/diffgram	master	Python
ML	Drakkar-Software/OctoBot	master	Python
ML	EmergentOrder/onnx-scala	main	Scala
ML	FluxML/NNlib.jl	master	Julia
ML	henrysky/astroNN	master	Python
ML	LaurentMazare/tch-rs	main	Rust
ML	microsoft/onnxruntime	main	C++
ML	microsoft/pai	master	JavaScript
ML	mlflow/mlflow	master	Python
ML	msdslab/automated-systematic-review	master	Python
ML	OpenKore/openkore	master	Perl
ML	OpenNMT/OpenNMT-py	master	Python
ML	polyaxon/polyaxon-chart	master	Smarty
ML	QuantumBFS/Yao.jl	master	Julia
ML	SeldonIO/seldon-core	master	HTML
ML	SMTorg/smt	master	Jupyter Notebook
ML	tesseract-ocr/tesseract	main	C++
ML	Texera/texera	master	Java
ML	uber/ludwig	master	Python
ML	ultralytics/yolov3	master	Python
ML	VowpalWabbit/vowpal_wabbit	master	C++
ML	zhenghaoz/gorse	master	Go
ML	alan-turing-institute/sktime	main	Python
ML	albu/albumentations	master	Python
ML	allenai/allennlp	main	Python
ML	analysiscenter/batchflow	master	Python
ML	apache/incubator-mxnet	master	C++
ML	arraiyopensource/kornia	master	Python
ML	BehaviorTree/BehaviorTree.CPP	master	C++
ML	biolab/orange3	master	Python
ML	comic/grand-challenge.org	main	Python
ML	davisking/dlib	master	C++
ML	dmlc/tvm	main	Python
ML	dmlc/xgboost	master	C++
ML	espnet/espnet	master	Python
ML	FluxML/Flux.jl	master	Julia
ML	FluxML/Metalhead.jl	master	Julia
ML	geomstats/geomstats	master	Jupyter Notebook
ML	gojek/feast	master	Python
ML	h2oai/h2o-3	master	Jupyter Notebook
ML	h2oai/sparkling-water	master	Scala
ML	huggingface/pytorch-pretrained-BERT	main	Python
ML	intel-analytics/BigDL	main	Jupyter Notebook
ML	iterative/dvc	main	Python
ML	JohnSnowLabs/spark-nlp	master	Scala
ML	jtablesaw/tablesaw	master	Java
ML	kendryte/nncase	master	C#
ML	materialsvirtuallab/megnet	master	Jupyter Notebook
ML	microsoft/dowhy	main	Python
ML	microsoft/LightGBM	master	C++
ML	mlpack/mlpack	master	C++
ML	mne-tools/mne-cpp	main	C++
ML	nilearn/nilearn	main	Python
ML	nipy/dipy	master	Python
ML	onnx/onnx	main	Python
ML	opencv/dldt	master	C++
ML	pfnet/optuna	master	Python
ML	pytorch/ignite	master	Python
ML	pytorch/tnt	master	Python
ML	pytorch/vision	main	Python
ML	rflamary/POT	master	Python
ML	RubixML/RubixML	master	PHP
ML	scikit-learn/scikit-learn	main	Python
ML	shimat/opencvsharp	master	C#
ML	skorch-dev/skorch	master	Jupyter Notebook
ML	smistad/FAST	master	C++
ML	sorgerlab/indra	master	Python
ML	Tencent/ncnn	master	C++
ML	tensorflow/addons	master	Python
ML	tensorly/tensorly	main	Python
ML	textlint/textlint	master	TypeScript
ML	TuringLang/Turing.jl	master	Julia
ML	williamFalcon/pytorch-lightning	master	Python
ML	yasserfarouk/negmas	master	Jupyter Notebook
ML	zalandoresearch/flair	master	Python
ML	zhongkaifu/Seq2SeqSharp	master	C#
No ML	adaltas/node-csv	master	CoffeeScript
No ML	cmusphinx/pocketsphinx	master	C
No ML	facebookresearch/fairseq	main	Python
No ML	fireeye/flare-floss	master	Python
No ML	jacomyal/sigma.js	main	TypeScript
No ML	jwilder/docker-gen	main	Go
No ML	macbre/analyze-css	master	JavaScript
No ML	microsoft/TypeScript	main	TypeScript
No ML	microsoft/monaco-editor	main	JavaScript
No ML	mishoo/UglifyJS	master	JavaScript
No ML	muukii/Pixel	main	Swift
No ML	ofek/hatch	master	Python
No ML	p-org/P	master	C#
No ML	postcss/postcss-loader	master	JavaScript
No ML	prest/prest	main	Go
No ML	pymc-devs/pymc	main	Python
No ML	searchkit/searchkit	main	TypeScript
No ML	srvrco/getssl	master	Shell
No ML	websocket-client/websocket-client	master	Python
No ML	yahoo/react-stickynode	master	JavaScript
No ML	vuejs/vue	main	TypeScript
No ML	twbs/bootstrap	main	JavaScript
No ML	ohmyzsh/ohmyzsh	master	Shell
No ML	flutter/flutter	master	Dart
No ML	ytdl-org/youtube-dl	master	Python
ML	huggingface/transformers	main	Python
No ML	denoland/deno	main	Rust
No ML	ant-design/ant-design	master	TypeScript
No ML	puppeteer/puppeteer	main	TypeScript
ML	AUTOMATIC1111/stable-diffusion-webui	master	Python
No ML	django/django	main	Python
No ML	gin-gonic/gin	master	Go
No ML	tailwindlabs/tailwindcss	master	HTML
No ML	sveltejs/svelte	master	TypeScript
No ML	gohugoio/hugo	master	Go
No ML	moby/moby	master	Go
No ML	netdata/netdata	master	C
No ML	pallets/flask	main	Python
No ML	expressjs/express	master	JavaScript
No ML	chartjs/Chart.js	master	JavaScript
No ML	coder/code-server	main	TypeScript
No ML	reduxjs/redux	master	TypeScript
No ML	tiangolo/fastapi	master	Python
No ML	vitejs/vite	main	TypeScript
No ML	anuraghazra/github-readme-stats	master	JavaScript
No ML	h5bp/html5-boilerplate	main	JavaScript
No ML	strapi/strapi	main	JavaScript
No ML	junegunn/fzf	master	Go
No ML	syncthing/syncthing	main	Go
ML	apache/superset	master	TypeScript
No ML	microsoft/playwright	main	TypeScript
No ML	hoppscotch/hoppscotch	main	TypeScript
No ML	remix-run/react-router	main	TypeScript
No ML	prometheus/prometheus	main	Go
No ML	yt-dlp/yt-dlp	master	Python
No ML	obsproject/obs-studio	master	C
No ML	google/guava	master	Java
No ML	caddyserver/caddy	master	Go
No ML	scrapy/scrapy	master	Python
No ML	jekyll/jekyll	master	Ruby
No ML	git/git	master	C
No ML	prettier/prettier	main	JavaScript
No ML	facebook/docusaurus	main	TypeScript
No ML	serverless/serverless	main	JavaScript
No ML	square/okhttp	master	Kotlin
No ML	etcd-io/etcd	main	Go
No ML	TryGhost/Ghost	main	JavaScript
No ML	traefik/traefik	master	Go
No ML	mozilla/pdf.js	master	JavaScript
No ML	gogs/gogs	main	Go
No ML	hwchase17/langchain	master	Python
No ML	rustdesk/rustdesk	master	Rust
No ML	jestjs/jest	main	TypeScript
No ML	square/retrofit	master	Java
No ML	oven-sh/bun	main	Zig
No ML	sharkdp/bat	master	Rust
No ML	sherlock-project/sherlock	master	Python
No ML	styled-components/styled-components	main	TypeScript
ML	ultralytics/yolov5	master	Python
No ML	rclone/rclone	master	Go
ML	pandas-dev/pandas	main	Python
No ML	agalwood/Motrix	master	JavaScript
No ML	BurntSushi/ripgrep	master	Rust
No ML	vuejs/core	main	TypeScript
No ML	Leaflet/Leaflet	main	JavaScript
No ML	google/zx	main	JavaScript
No ML	hashicorp/terraform	main	Go
No ML	vuetifyjs/vuetify	master	TypeScript
No ML	streamich/react-use	master	TypeScript
No ML	hexojs/hexo	master	JavaScript
No ML	Homebrew/brew	master	Ruby
ML	apache/spark	master	Scala
No ML	videojs/video.js	main	JavaScript
No ML	nolimits4web/swiper	master	JavaScript
No ML	jesseduffield/lazygit	master	Go
No ML	topjohnwu/Magisk	master	C++
No ML	AppFlowy-IO/AppFlowy	main	Rust
No ML	Kong/kong	master	Lua

How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions

PAPER SUMMARY

Results

RQ1 RESULTS' DETAILS

RQ2 RESULTS' DETAILS

RQ3 RESULTS' DETAILS

REPRODUCTION PACKAGE