git 如何使用 GitHub V3 API 获取 repo 的提交计数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27931139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use GitHub V3 API to get commit count for a repo?
提问by SteveCoffman
I am trying to count commits for many large github reposusing the API, so I would like to avoid getting the entire list of commits (this way as an example: api.github.com/repos/jasonrudolph/keyboard/commits ) and counting them.
我正在尝试使用 API计算许多大型 github 存储库的提交,因此我想避免获取整个提交列表(例如: api.github.com/repos/jasonrudolph/keyboard/commits )和计数他们。
If I had the hash of the first (initial) commit , I could use this technique to compare the first commit to the latestand it happily reports the total_commits in between (so I'd need to add one) that way. Unfortunately, I cannot see how to elegantly get the first commit using the API.
如果我有第一次(初始)提交的哈希值,我可以使用这种技术将第一次提交与最新提交进行比较,它会很高兴地以这种方式报告两者之间的 total_commits(因此我需要添加一个)。不幸的是,我看不到如何使用 API 优雅地获得第一次提交。
The base repo URL does give me the created_at (this url is an example: api.github.com/repos/jasonrudolph/keyboard ), so I could get a reduced commit set by limiting the commits to be until the create date (this url is an example: api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z) and using the earliest one (always listed last?) or maybe the one with an empty parent (not sure about if forked projects have initial parent commits).
基本 repo URL 确实给了我 created_at(这个 url 是一个例子:api.github.com/repos/jasonrudolph/keyboard),所以我可以通过将提交限制到创建日期(这个 url是一个例子:api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z)并使用最早的(总是列在最后?)不确定分叉项目是否具有初始父提交)。
Any better way to get the first commit hash for a repo?
有没有更好的方法来获取 repo 的第一个提交哈希?
Better yet, this whole thing seems convoluted for a simple statistic, and I wonder if I'm missing something. Any better ideas for using the API to get the repo commit count?
更好的是,这整个事情对于一个简单的统计数据来说似乎很复杂,我想知道我是否遗漏了什么。关于使用 API 获取 repo 提交计数的任何更好的想法?
Edit: This somewhat similar questionis trying to filter by certain files (" and within them to specific files."), so has a different answer.
编辑:这个有点类似的问题试图按某些文件(“并在其中过滤到特定文件。”)进行过滤,因此有不同的答案。
采纳答案by Bertrand Martel
You can consider using GraphQL API v4to perform commit count for multiple repositories at the same times using aliases. The following will fetch commit count for all branches of 3 distinct repositories (up to 100 branches per repo) :
您可以考虑使用GraphQL API v4使用aliases同时对多个存储库执行提交计数。以下将获取 3 个不同存储库的所有分支的提交计数(每个存储库最多 100 个分支):
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
refs(first: 100, refPrefix: "refs/heads/") {
edges {
node {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
}
}
RepoFragment
is a fragmentwhich helps to avoid the duplicate query fields for each of those repo
RepoFragment
是一个片段,有助于避免每个 repo 的重复查询字段
If you only need commit count on the default branch, it's more straightforward :
如果您只需要在默认分支上提交计数,则更简单:
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
defaultBranchRef {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
回答by Ivan Zuzak
If you're looking for the total number of commits in the default branch, you might consider a different approach.
如果您正在寻找默认分支中的提交总数,您可能会考虑采用不同的方法。
Use the Repo Contributors API to fetch a list of all contributors:
使用 Repo Contributors API 获取所有贡献者的列表:
https://developer.github.com/v3/repos/#list-contributors
https://developer.github.com/v3/repos/#list-contributors
Each item in the list will contain a contributions
field which tells you how many commits the user authored in the default branch. Sum those fields across all contributors and you should get the total number of commits in the default branch.
列表中的每一项都将包含一个contributions
字段,该字段告诉您用户在默认分支中创作的提交次数。将所有贡献者的这些字段相加,您应该得到默认分支中的提交总数。
The list of contributors if often much shorter than the list of commits, so it should take fewer requests to compute the total number of commits in the default branch.
贡献者列表通常比提交列表短得多,因此计算默认分支中的提交总数应该需要更少的请求。
回答by snowe
Simple solution: Look at the page number. Github paginates for you. so you can easily calculate the number of commits by just getting the last page number from the Link header, subtracting one (you'll need to add up the last page manually), multiplying by the page size, grabbing the last page of results and getting the size of that array and adding the two numbers together. It's a max of two API calls!
简单的解决办法:看页码。Github 为你分页。因此,您只需从链接标题中获取最后一个页码,减去一个(您需要手动添加最后一页),乘以页面大小,获取最后一页结果和获取该数组的大小并将两个数字相加。最多两个 API 调用!
Here is my implementation of grabbing the total number of commits for an entire organization using the octokit gem in ruby:
这是我使用 ruby 中的 octokit gem 获取整个组织的提交总数的实现:
@github = Octokit::Client.new access_token: key, auto_traversal: true, per_page: 100
Octokit.auto_paginate = true
repos = @github.org_repos('my_company', per_page: 100)
# * take the pagination number
# * get the last page
# * see how many items are on it
# * multiply the number of pages - 1 by the page size
# * and add the two together. Boom. Commit count in 2 api calls
def calc_total_commits(repos)
total_sum_commits = 0
repos.each do |e|
repo = Octokit::Repository.from_url(e.url)
number_of_commits_in_first_page = @github.commits(repo).size
repo_sum = 0
if number_of_commits_in_first_page >= 100
links = @github.last_response.rels
unless links.empty?
last_page_url = links[:last].href
/.*page=(?<page_num>\d+)/ =~ last_page_url
repo_sum += (page_num.to_i - 1) * 100 # we add the last page manually
repo_sum += links[:last].get.data.size
end
else
repo_sum += number_of_commits_in_first_page
end
puts "Commits for #{e.name} : #{repo_sum}"
total_sum_commits += repo_sum
end
puts "TOTAL COMMITS #{total_sum_commits}"
end
and yes I know the code is dirty, this was just thrown together in a few minutes.
是的,我知道代码很脏,这只是在几分钟内拼凑起来的。
回答by buckley
Using the GraphQL API v4 is probably the way to handle this if you're starting out in a new project, but if you're still using the REST API v3 you can get around the pagination issue by limiting the request to just 1 result per page. By setting that limit, the number of pages
returned in the last link will be equal to the total.
如果您刚开始一个新项目,使用 GraphQL API v4 可能是处理此问题的方法,但如果您仍在使用 REST API v3,您可以通过将请求限制为每个结果 1 个来解决分页问题页。通过设置该限制,pages
最后一个链接中返回的数量将等于总数。
For example using python3 and the requests library
例如使用 python3 和请求库
def commit_count(project, sha='master', token=None):
"""
Return the number of commits to a project
"""
token = token or os.environ.get('GITHUB_API_TOKEN')
url = f'https://api.github.com/repos/{project}/commits'
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'token {token}',
}
params = {
'sha': sha,
'per_page': 1,
}
resp = requests.request('GET', url, params=params, headers=headers)
if (resp.status_code // 100) != 2:
raise Exception(f'invalid github response: {resp.content}')
# check the resp count, just in case there are 0 commits
commit_count = len(resp.json())
last_page = resp.links.get('last')
# if there are no more pages, the count must be 0 or 1
if last_page:
# extract the query string from the last page url
qs = urllib.parse.urlparse(last_page['url']).query
# extract the page number from the query string
commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
return commit_count
回答by fnkr
I just made a little script to do this. It may not work with large repositories since it does not handle GitHub's rate limits. Also it requires the Python requestspackage.
我只是做了一个小脚本来做到这一点。它可能不适用于大型存储库,因为它不处理 GitHub 的速率限制。它还需要 Python requests包。
#!/bin/env python3.4
import requests
GITHUB_API_BRANCHES = 'https://%(token)[email protected]/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)[email protected]/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'
def github_commit_counter(namespace, repository, access_token=''):
commit_store = list()
branches = requests.get(GITHUB_API_BRANCHES % {
'token': access_token,
'namespace': namespace,
'repository': repository,
}).json()
print('Branch'.ljust(47), 'Commits')
print('-' * 55)
for branch in branches:
page = 1
branch_commits = 0
while True:
commits = requests.get(GUTHUB_API_COMMITS % {
'token': access_token,
'namespace': namespace,
'repository': repository,
'sha': branch['name'],
'page': page
}).json()
page_commits = len(commits)
for commit in commits:
commit_store.append(commit['sha'])
branch_commits += page_commits
if page_commits == 0:
break
page += 1
print(branch['name'].ljust(45), str(branch_commits).rjust(9))
commit_store = set(commit_store)
print('-' * 55)
print('Total'.ljust(42), str(len(commit_store)).rjust(12))
# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')
回答by Arcsector
I used python to create a generator which returns a list of contributors, sums up the total commit count, and then checks if it is valid. Returns True
if it has less, and False
if the same or greater commits. The only thing you have to fill in is the requests session that uses your credentials. Here's what I wrote for you:
我使用 python 创建了一个生成器,它返回一个贡献者列表,总结总提交计数,然后检查它是否有效。True
如果它有更少,并且False
如果提交相同或更多,则返回。您唯一需要填写的是使用您的凭据的请求会话。这是我为你写的:
from requests import session
def login()
sess = session()
# login here and return session with valid creds
return sess
def generateList(link):
# you need to login before you do anything
sess = login()
# because of the way that requests works, you must start out by creating an object to
# imitate the response object. This will help you to cleanly while-loop through
# github's pagination
class response_immitator:
links = {'next': {'url':link}}
response = response_immitator()
while 'next' in response.links:
response = sess.get(response.links['next']['url'])
for repo in response.json():
yield repo
def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
# login first
sess = login()
if max_commit_count != None:
totalcommits = 0
# construct url to paginate
url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
for stats in generateList(url):
totalcommits+=stats['total']
if totalcommits >= max_commit_count:
return False
else:
return True
def main():
# what user do you want to check for commits
user_name = "arcsector"
# what repo do you want to check for commits
repo_name = "EyeWitness"
# github's base api url
baseurl = "https://api.github.com/"
# call function
check_commit_count(baseurl, user_name, repo_name, 30)
if __name__ == "__main__":
main()