aboutsummaryrefslogtreecommitdiffstats
path: root/src/main.rs
AgeCommit message (Collapse)Author
2022-06-04Update copyright yearTeddy Wing
2022-06-04github::Repo: Change `git_url` to `clone_url`Teddy Wing
I was getting errors mirroring and updating: failed to connect to github.com: Connection timed out; class=Os (2) and remote 'origin' already exists; class=Config (7); code=Exists(-4) It turns out that the `git_url` field, which I had been using previously to mirror and clone repositories, stopped working. My guess is that it's because Reflectub is not authorised to clone GitHub "git://" URLs, so the connection timed out. I'm not sure why this stopped being allowed, though. The URL change seems to have happened around March 2022, or at least between December 2021 and April 2022. The second error was caused by a previously-created repository existing in the filesystem, but not being in the database as it hadn't been correctly mirrored. For now, I've decided not to fix that problem and am only fixing the URL issue. The GitHub API also includes a `clone_url` field, which contains an HTTPS clone URL. Using this URL to mirror fixes the timeout problem.
2021-06-25set_agefile_time(): Don't add `agefile=info/web/last-modified` to cgitrcTeddy Wing
We don't need to set the `agefile` config value because "info/web/last-modified" is already CGit's default value for the setting.
2021-06-25main: If the default branch is not "master", set cgitrc defbranchTeddy Wing
In order for CGit to know that the repository uses a default branch that isn't "master", we need to set the `defbranch` setting in 'cgitrc'. The mtime is read from either the "master" ref or the ref specified with `defbranch`: https://git.zx2c4.com/cgit/tree/ui-repolist.c?id=5258c297ba6fb604ae1415fbc19a3fe42457e49e#n56
2021-06-25set_agefile_time: Move repo cgitrc file append to a functionTeddy Wing
I need to add another line to the repo-local cgitrc file to set the default branch. Move this code to a function so we can reuse it.
2021-06-24main::update(): Change HEAD branch if default branch changedTeddy Wing
If the default branch on GitHub changed, change the local mirror's HEAD to match the new default. Need to store the default branch in the database now so we can find out whether it changed.
2021-06-24git::mirror(): Change HEAD to GitHub default branchTeddy Wing
The default branch after mirroring was typically 'master'. On GitHub, the default branch may not necessarily be 'master'. Change the default branch by changing the HEAD to GitHub's default branch so that the mirrored repository better matches GitHub. We'll also need to make a change to the update function in case the default branch changes after mirroring.
2021-06-24main: Re-enable GitHub repository fetchingTeddy Wing
Now that we're done working out empty repository handling for setting the mtime, revert this hard-coded test repository change.
2021-06-23update_mtime(): Set the mtime to the repo's `pushed_at` timeTeddy Wing
Use `pushed_at` instead of `updated_at`. This mtime is used to sort repositories on CGit's repository index page. Prevent things like GitHub stars from changing the sort order. The sort should instead be influenced by repository changes.
2021-06-23update_mtime(): Extract agefile handling to a separate functionTeddy Wing
The `update_mtime()` function is getting pretty long. Extract this into a new function since it's more of a self-contained unit.
2021-06-23update_mtime(): Remove `.or_else()` ideaTeddy Wing
Decided to keep the `match` expressions. Still working out how to clean up the code in this function, though.
2021-06-23update_mtime(): Write update time to CGit agefile as last recourseTeddy Wing
Adjust the match arms to remove a bit of indentation. Add some 'anyhow' context to errors for better error reporting. When "repo/refs/heads/[default-branch]" or "repo/packed-refs" files don't exist, create a "repo/info/web/last-modified" file and set this file as the CGit agefile in the repo's local 'cgitrc' file. It's possible for a repo to not have either of the first two files when the repo is empty and has no commits.
2021-06-23update_mtime(): Idea for `.or_else()` chainingTeddy Wing
An idea to chain the error handling here instead of using `match` expressions.
2021-06-23main: Use test repositories instead of getting repos from GitHubTeddy Wing
Add a couple of test repositories that we can use to test empty repository handling.
2021-06-15run(): Only clone repo name if we need it for error contextTeddy Wing
This avoids cloning names of repos that are processed successfully.
2021-06-13run(): Remove debug print of the current threadTeddy Wing
We know this runs on multiple threads now, so this debug line can be removed.
2021-06-13run(): Remove limit to two repositories used for testingTeddy Wing
I artificially limited the number of repositories processed to two for testing so that I wouldn't download an mirror all of my repositories while testing the program. Now that things seem to be working, remove this artificial limit.
2021-06-13main: Add context to GitHub fetch errorTeddy Wing
2021-06-13run(): Don't clone `base_cgitrc` into each threadTeddy Wing
Rejigger some types and signatures to allow us to get references to the `base_cgitrc` path instead of copying it for each repository.
2021-06-13MultiError: Remove impl `Iterator` testsTeddy Wing
Remove the `Iterator` test implementations that didn't work out.
2021-06-13main(): Print "error: " in front of each error lineTeddy Wing
Prefix each error line with the text "error: " to make it clear that's what it is, and that it's separate from errors printed on other lines. Worked out how to set up an `Iterator` for `MultiError` based on a comment by 'chris-morgan' (https://old.reddit.com/user/chris-morgan) on Reddit /r/rust: > 1. Implement your own iterator type which wraps existing iterator > types (std::slice::Iter, and std::vec::IntoIter if you want a > consuming iterator). > Advantages: most flexible, ensures API stability if you > need to change internal details. > Disadvantages: a lot more effort, if you want to do it properly > (which involves implementing about ten traits on your iterator > wrapper type); and if slices or their iterators add something new, > you don’t get it unless you implement a wrapper yourself. > > 2. Have your iter() functions and IntoIterator implementations use the > standard iterator types directly. > Advantages: easier, gets you all the other trait implementations on > std::slice::Iter for free—AsRef, Clone, FusedIterator, > ExactSizeIterator, Debug, Send, DoubleEndedIterator, TrustedLen, > Sync). > Disadvantages: if you need to restructure things so that this is no > longer an option (e.g. store things in a different type of vector > and thus need to map it before presenting it to the user) it’s a > breaking change. > > 3. Implement Deref<Target = [(K, V)]> and just treat your Bucket<K, V> > as a &[(K, V)]. (Read-only; implement DerefMut if you want to allow > mutations of values.) > Advantages: easy, and lets you simply treat the whole thing as a > slice (this is what Vec<T> does). > Disadvantages: there really aren’t any, if it matches your purpose. > (If not, it’s useless.) (https://old.reddit.com/r/rust/comments/7a0slp/questionimplementing_iterator_for_a_struct_with_a/)
2021-06-13main: Remove unused `std::sync` importsTeddy Wing
2021-06-13Move `MultiError` to its own fileTeddy Wing
2021-06-13MultiError: Remove old `errors` fieldTeddy Wing
Decided to use 'anyhow' errors instead of a generic boxed error.
2021-06-13run(): Prefix repository errors with the name of the repositoryTeddy Wing
So you know what the error referred to.
2021-06-13run(): Add a note to include the repo name in repo errorsTeddy Wing
2021-06-13run(): Adjust whitespaceTeddy Wing
Make all chained methods indented.
2021-06-13run(): Return multiple errorsTeddy Wing
Return all errors from repo processing. This allows us to provide information on all errors that happened while processing, but continue processing all the repos even if there's an error in one of them. A new `MultiError` type wraps a list of errors to do this.
2021-06-13main: Remove unused `r2d2_sqlite::SqliteConnectionManager` importTeddy Wing
Not sure when or why I added this.
2021-06-12main: Remove commented multithreading test codeTeddy Wing
Remove my old tests now that we have a multi-threading setup that actually works.
2021-06-12Process repositories on multiple threadsTeddy Wing
Use 'rayon' to parallelise the repository processing. Each repository is processed in a thread in the default 'rayon' pool. In order to get thread-safe access to the database, I followed some advice from a Stack Overflow answer by VasiliNovikov (https://stackoverflow.com/users/1091436/vasilinovikov): https://stackoverflow.com/questions/62560396/how-to-use-sqlite-via-rusqlite-from-multiple-threads/62560397#62560397 VasiliNovikov recommended creating a database connection pool using 'r2d2_sqlite'. This way we don't have to share a database connection between threads, but each thread can have its own connection. This also means we can remove mutable requirements in a bunch of places involving our `database::Db` type since we're no longer managing the database connections directly.
2021-06-12Switch from 'reqwest' to 'ureq'; Remove asyncTeddy Wing
Remove all async from the project by switching from 'reqwest' to 'ureq'. This should make the code simpler, and hopefully enable us to try out multithreading.
2021-06-12run(): Add context to database errorsTeddy Wing
To allow us to work out where the error is coming from.
2021-06-12main: try! error from `process_repo`Teddy Wing
2021-06-12main: Remove async database callsTeddy Wing
Remove all the async database calls and Tokio spawning. Still haven't worked out the error code 21 database error from earlier, but this will hopefully allow us to use normal threads directly.
2021-06-11Replace 'sqlx' with 'rusqlite'Teddy Wing
Trying to get rid of async. This compiles, but fails with the following runtime error: Error code 21: Library used incorrectly Need to investigate further.
2021-06-11Try moving things around for multi-threadingTeddy Wing
Still isn't multi-threaded. Not sure what I'm doing wrong.
2021-06-07Add license (GNU GPLv3+)Teddy Wing
2021-06-07main: Limit to 5 repos for thread debuggingTeddy Wing
2021-06-07main: Not multi-threadedTeddy Wing
Looks like the work doesn't happen on multiple threads. All of the tasks printed the same thread ID. Need to do some more work to get this working properly, it seems.
2021-06-07main: Collect errors from spawned tasksTeddy Wing
Collect all errors into a list. I think I'm going to return them as a list from this function. The runtime appears a lot slower with this change. Need to figure out what that's about.
2021-06-07Switch `futures::executor` to Tokio runtimeTeddy Wing
Use the Tokio runtime we created to run the blocking async tasks. Trying to set this up so I can get results back from the spawned tasks, but I'm currently having trouble working out how to extract them from the async task and return them from `run()`. I suppose I could just print out the errors directly in that `while let` loop, but ideally I'd like to return all errors from `run()` rather than printing in `run()`.
2021-06-06Split database mutex lock and create calls onto multiple linesTeddy Wing
To separate the actions more.
2021-06-06main: Add a comment about the repo size flag parse error handlingTeddy Wing
2021-06-06main::run(): Get repositories from GitHub API callTeddy Wing
Remove the hard-coded test repositories I was using and replace them with real ones retrieved from the GitHub API. Enable I/O and timers on the Tokio runtime in order to enable the async GitHub API request.
2021-06-06main: Remove `unwrap` when parsing `--skip-larger-than`Teddy Wing
Don't panic here so we can use our own error message template.
2021-06-06main(): Remove `unwrap`Teddy Wing
Print the error instead of unwrapping.
2021-06-06main: Add function documentationTeddy Wing
2021-06-06Provide an option to skip repos larger than a given sizeTeddy Wing
Allows a maximum repo size to be given as a command line argument. Repos larger than this will not be mirrored. This gives us a way to save server space by avoiding gigantic repositories.
2021-06-06Remove old in-progress threading codeTeddy Wing
Remove this now that we have something that I think works.