reflectub - Mirror a user’s GitHub repositories

Age	Commit message (Collapse)	Author
2022-06-04	database: Update copyright yearmake-errors-more-traceable-2	Teddy Wing

2022-06-04	database::Repo: From<&github::Repo>: Use newest update date	Teddy Wing
	It turns out the GitHub `updated_at` field doesn't change when new commits are pushed to the repository, only when the repository config etc. changes. In order to update the mirrors when any update happens in the repository, we need to look at both of those date values to see if they've been updated. Take the most recent of `updated_at` or `pushed_at` and set it to `(database::Repo).updated_at`. This allows us to refresh the repo when either of those dates change, catching all GitHub repo updates.
2022-06-04	Update copyright year	Teddy Wing

2022-06-04	github::Repo: Change `git_url` to `clone_url`	Teddy Wing
	I was getting errors mirroring and updating: failed to connect to github.com: Connection timed out; class=Os (2) and remote 'origin' already exists; class=Config (7); code=Exists(-4) It turns out that the `git_url` field, which I had been using previously to mirror and clone repositories, stopped working. My guess is that it's because Reflectub is not authorised to clone GitHub "git://" URLs, so the connection timed out. I'm not sure why this stopped being allowed, though. The URL change seems to have happened around March 2022, or at least between December 2021 and April 2022. The second error was caused by a previously-created repository existing in the filesystem, but not being in the database as it hadn't been correctly mirrored. For now, I've decided not to fix that problem and am only fixing the URL issue. The GitHub API also includes a `clone_url` field, which contains an HTTPS clone URL. Using this URL to mirror fixes the timeout problem.
2022-06-03	git.rs: Add more context to errors	Teddy Wing
	Add full definitions for our new error variant ideas. Use a distinct variant and message in each error case in order to trace errors to the line of code where they occur.
2022-06-03	Cargo.toml: Update 'thiserror' to v1.0.31	Teddy Wing

2022-06-03	Update Cargo.lock	Teddy Wing

2022-06-02	git.rs: Ideas for error structure and context	Teddy Wing

2022-06-02	git.rs: Add `Error::MirrorAddRemote` variant	Teddy Wing

2022-06-02	git.rs: Add ideas for new error variants	Teddy Wing
	These variants should make it easier to trace where in the code that a particular error occurred, and include a context-descriptive error message.
2021-06-26	Increase version v0.0.1 -> v0.0.2v0.0.2	Teddy Wing
	And add a change log.
2021-06-25	set_agefile_time(): Don't add `agefile=info/web/last-modified` to cgitrc	Teddy Wing
	We don't need to set the `agefile` config value because "info/web/last-modified" is already CGit's default value for the setting.
2021-06-25	README: Reword description about CGit	Teddy Wing
	Reflectub is designed to work specifically with CGit, but could work with other Git web frontends. Make this more explicit in the description.
2021-06-25	main: If the default branch is not "master", set cgitrc defbranch	Teddy Wing
	In order for CGit to know that the repository uses a default branch that isn't "master", we need to set the `defbranch` setting in 'cgitrc'. The mtime is read from either the "master" ref or the ref specified with `defbranch`: https://git.zx2c4.com/cgit/tree/ui-repolist.c?id=5258c297ba6fb604ae1415fbc19a3fe42457e49e#n56
2021-06-25	set_agefile_time: Move repo cgitrc file append to a function	Teddy Wing
	I need to add another line to the repo-local cgitrc file to set the default branch. Move this code to a function so we can reuse it.
2021-06-24	Update TODO	Teddy Wing

2021-06-24	main::update(): Change HEAD branch if default branch changed	Teddy Wing
	If the default branch on GitHub changed, change the local mirror's HEAD to match the new default. Need to store the default branch in the database now so we can find out whether it changed.
2021-06-24	git::mirror(): Change HEAD to GitHub default branch	Teddy Wing
	The default branch after mirroring was typically 'master'. On GitHub, the default branch may not necessarily be 'master'. Change the default branch by changing the HEAD to GitHub's default branch so that the mirrored repository better matches GitHub. We'll also need to make a change to the update function in case the default branch changes after mirroring.
2021-06-24	Update TODO	Teddy Wing

2021-06-24	main: Re-enable GitHub repository fetching	Teddy Wing
	Now that we're done working out empty repository handling for setting the mtime, revert this hard-coded test repository change.
2021-06-24	Merge branch 'add-support-for-empty-repositories'	Teddy Wing

2021-06-23	update_mtime(): Set the mtime to the repo's `pushed_at` time	Teddy Wing
	Use `pushed_at` instead of `updated_at`. This mtime is used to sort repositories on CGit's repository index page. Prevent things like GitHub stars from changing the sort order. The sort should instead be influenced by repository changes.
2021-06-23	update_mtime(): Extract agefile handling to a separate function	Teddy Wing
	The `update_mtime()` function is getting pretty long. Extract this into a new function since it's more of a self-contained unit.
2021-06-23	update_mtime(): Remove `.or_else()` idea	Teddy Wing
	Decided to keep the `match` expressions. Still working out how to clean up the code in this function, though.
2021-06-23	update_mtime(): Write update time to CGit agefile as last recourse	Teddy Wing
	Adjust the match arms to remove a bit of indentation. Add some 'anyhow' context to errors for better error reporting. When "repo/refs/heads/[default-branch]" or "repo/packed-refs" files don't exist, create a "repo/info/web/last-modified" file and set this file as the CGit agefile in the repo's local 'cgitrc' file. It's possible for a repo to not have either of the first two files when the repo is empty and has no commits.
2021-06-23	update_mtime(): Idea for `.or_else()` chaining	Teddy Wing
	An idea to chain the error handling here instead of using `match` expressions.
2021-06-23	main: Use test repositories instead of getting repos from GitHub	Teddy Wing
	Add a couple of test repositories that we can use to test empty repository handling.
2021-06-23	git::mirror(): Fix setting repository description on Linux	Teddy Wing
	After a bunch of investigation, first with a small 'git2' project, then a 'libgit2-sys' project, then a 'libgit2' C project, I finally discovered why setting the description worked on Mac OS but not on Linux. Turning on the `GIT_REPOSITORY_INIT_EXTERNAL_TEMPLATE` repository init flag caused the default description to be used instead of the custom description passed in the init. Turn off the flag to allow us to set the description on Linux. Here is the source of the test builds I made: git2 test: use git2; fn main() { let path = "/tmp/test-repo"; let description = "the description"; let repo = git2::Repository::init_opts( path, &git2::RepositoryInitOptions::new() .bare(true) .external_template(false) .description(description), ).unwrap(); } libgit2-sys test: use libgit2_sys; use std::ffi::CString; use std::ptr; fn main() { let _ = unsafe { libgit2_sys::git_libgit2_init() }; let mut repo = ptr::null_mut(); let path = CString::new("/tmp/test-repo").unwrap(); let description = CString::new("Test").unwrap(); let mut opts = libgit2_sys::git_repository_init_options { version: libgit2_sys::GIT_REPOSITORY_INIT_OPTIONS_VERSION, flags: libgit2_sys::GIT_REPOSITORY_INIT_MKDIR as u32 \| libgit2_sys::GIT_REPOSITORY_INIT_MKPATH as u32 \| libgit2_sys::GIT_REPOSITORY_INIT_EXTERNAL_TEMPLATE as u32, mode: 0, workdir_path: ptr::null(), description: description.as_ptr(), template_path: ptr::null(), initial_head: ptr::null(), origin_url: ptr::null(), }; let error = unsafe { libgit2_sys::git_repository_init_ext( &mut repo, path.as_ptr(), &mut opts, ) }; dbg!(&error); } libgit2 test: #include <stdio.h> #include "git2.h" int main() { int error; const git_error lg2err; error = git_libgit2_init(); if (error <= 0) { printf("git_libgit2_init error: %d\n", error); } git_repository repo = NULL; git_repository_init_options opts = GIT_REPOSITORY_INIT_OPTIONS_INIT; /* Customize options / opts.flags \|= GIT_REPOSITORY_INIT_MKPATH; / mkdir as needed to create repo / opts.flags \|= GIT_REPOSITORY_INIT_MKDIR; / opts.flags \|= GIT_REPOSITORY_INIT_EXTERNAL_TEMPLATE; */ opts.description = "Custom test description"; error = git_repository_init_ext(&repo, "/tmp/test-repo", &opts); printf("git_repository_init_ext error: %d\n", error); lg2err = git_error_last(); if (lg2err != NULL) { printf("%s\n", lg2err->message); } }
2021-06-20	Update TODO	Teddy Wing

2021-06-20	README: Add real informationv0.0.1	Teddy Wing
	The previous README was just a quick usage note for others when I needed to ask a question on IRC.
2021-06-20	Add manual page	Teddy Wing

2021-06-15	run(): Only clone repo name if we need it for error context	Teddy Wing
	This avoids cloning names of repos that are processed successfully.
2021-06-13	run(): Remove debug print of the current thread	Teddy Wing
	We know this runs on multiple threads now, so this debug line can be removed.
2021-06-13	Update TODO	Teddy Wing

2021-06-13	run(): Remove limit to two repositories used for testing	Teddy Wing
	I artificially limited the number of repositories processed to two for testing so that I wouldn't download an mirror all of my repositories while testing the program. Now that things seem to be working, remove this artificial limit.
2021-06-13	main: Add context to GitHub fetch error	Teddy Wing

2021-06-13	run(): Don't clone `base_cgitrc` into each thread	Teddy Wing
	Rejigger some types and signatures to allow us to get references to the `base_cgitrc` path instead of copying it for each repository.
2021-06-13	MultiError: Remove impl `Iterator` tests	Teddy Wing
	Remove the `Iterator` test implementations that didn't work out.
2021-06-13	main(): Print "error: " in front of each error line	Teddy Wing
	Prefix each error line with the text "error: " to make it clear that's what it is, and that it's separate from errors printed on other lines. Worked out how to set up an `Iterator` for `MultiError` based on a comment by 'chris-morgan' (https://old.reddit.com/user/chris-morgan) on Reddit /r/rust: > 1. Implement your own iterator type which wraps existing iterator > types (std::slice::Iter, and std::vec::IntoIter if you want a > consuming iterator). > Advantages: most flexible, ensures API stability if you > need to change internal details. > Disadvantages: a lot more effort, if you want to do it properly > (which involves implementing about ten traits on your iterator > wrapper type); and if slices or their iterators add something new, > you don’t get it unless you implement a wrapper yourself. > > 2. Have your iter() functions and IntoIterator implementations use the > standard iterator types directly. > Advantages: easier, gets you all the other trait implementations on > std::slice::Iter for free—AsRef, Clone, FusedIterator, > ExactSizeIterator, Debug, Send, DoubleEndedIterator, TrustedLen, > Sync). > Disadvantages: if you need to restructure things so that this is no > longer an option (e.g. store things in a different type of vector > and thus need to map it before presenting it to the user) it’s a > breaking change. > > 3. Implement Deref<Target = [(K, V)]> and just treat your Bucket<K, V> > as a &[(K, V)]. (Read-only; implement DerefMut if you want to allow > mutations of values.) > Advantages: easy, and lets you simply treat the whole thing as a > slice (this is what Vec<T> does). > Disadvantages: there really aren’t any, if it matches your purpose. > (If not, it’s useless.) (https://old.reddit.com/r/rust/comments/7a0slp/questionimplementing_iterator_for_a_struct_with_a/)
2021-06-13	main: Remove unused `std::sync` imports	Teddy Wing

2021-06-13	MultiError: Add struct documentation	Teddy Wing

2021-06-13	Move `MultiError` to its own file	Teddy Wing

2021-06-13	MultiError: Remove old `errors` field	Teddy Wing
	Decided to use 'anyhow' errors instead of a generic boxed error.
2021-06-13	run(): Prefix repository errors with the name of the repository	Teddy Wing
	So you know what the error referred to.
2021-06-13	run(): Add a note to include the repo name in repo errors	Teddy Wing

2021-06-13	run(): Adjust whitespace	Teddy Wing
	Make all chained methods indented.
2021-06-13	run(): Return multiple errors	Teddy Wing
	Return all errors from repo processing. This allows us to provide information on all errors that happened while processing, but continue processing all the repos even if there's an error in one of them. A new `MultiError` type wraps a list of errors to do this.
2021-06-13	main: Remove unused `r2d2_sqlite::SqliteConnectionManager` import	Teddy Wing
	Not sure when or why I added this.
2021-06-12	main: Remove commented multithreading test code	Teddy Wing
	Remove my old tests now that we have a multi-threading setup that actually works.
2021-06-12	Process repositories on multiple threads	Teddy Wing
	Use 'rayon' to parallelise the repository processing. Each repository is processed in a thread in the default 'rayon' pool. In order to get thread-safe access to the database, I followed some advice from a Stack Overflow answer by VasiliNovikov (https://stackoverflow.com/users/1091436/vasilinovikov): https://stackoverflow.com/questions/62560396/how-to-use-sqlite-via-rusqlite-from-multiple-threads/62560397#62560397 VasiliNovikov recommended creating a database connection pool using 'r2d2_sqlite'. This way we don't have to share a database connection between threads, but each thread can have its own connection. This also means we can remove mutable requirements in a bunch of places involving our `database::Db` type since we're no longer managing the database connections directly.