6 minutes
Written: 2021-05-23 05:37 +0000
Updated: 2024-08-06 00:52 +0000
Biblatex to Bibtex for Sphinx
Background
Transitioning from
biblatex
tobibtex
withbiber
forsphinx
I recently started a set of notes using jupyter-book
. However, in the process I ran into a horrific bibliography related SNAFU. sphinx
in its infinite wisdom only accepts a rather odd subset of bibtex
.
I have been happily exporting my giant bibliography with Zotero (and better bibtex) exporting my references as biblatex
while sphinx
started choking dreadfully. This post describes attempts to reconcile the biblatex
sources without manual intervention.
Input
The input file was an innocuous looking tiny neat biblatex
file:
1@online{grasedyckParameterdependentSmootherMultigrid2020,
2 title = {A Parameter-Dependent Smoother for the Multigrid Method},
3 author = {Grasedyck, Lars and Klever, Maren and Löbbert, Christian and Werthmann, Tim A.},
4 date = {2020-08-03},
5 url = {http://arxiv.org/abs/2008.00927},
6 urldate = {2021-05-22},
7 archiveprefix = {arXiv},
8 eprint = {2008.00927},
9 eprinttype = {arxiv},
10 keywords = {65N55; 15A69,Mathematics - Numerical Analysis},
11 primaryclass = {cs, math}
12}
13
14@article{grasedyckDistributedHierarchicalSVD2018,
15 title = {Distributed Hierarchical {{SVD}} in the {{Hierarchical Tucker}} Format},
16 author = {Grasedyck, Lars and Löbbert, Christian},
17 date = {2018},
18 journaltitle = {Numerical Linear Algebra with Applications},
19 volume = {25},
20 pages = {e2174},
21 issn = {1099-1506},
22 doi = {10.1002/nla.2174},
23 url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2174},
24 urldate = {2021-05-22},
25 annotation = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/nla.2174},
26 keywords = {Hierarchical Tucker,HT,multigrid,parallel algorithms,SVD,tensor arithmetic},
27 langid = {english},
28 number = {6}
29}
30
31@article{grasedyckHierarchicalSingularValue2010,
32 title = {Hierarchical {{Singular Value Decomposition}} of {{Tensors}}},
33 author = {Grasedyck, Lars},
34 date = {2010-01-01},
35 journaltitle = {SIAM Journal on Matrix Analysis and Applications},
36 shortjournal = {SIAM J. Matrix Anal. Appl.},
37 volume = {31},
38 pages = {2029--2054},
39 publisher = {{Society for Industrial and Applied Mathematics}},
40 issn = {0895-4798},
41 doi = {10.1137/090764189},
42 url = {https://epubs.siam.org/doi/abs/10.1137/090764189},
43 urldate = {2021-05-22},
44 number = {4}
45}
Manual cleaning
The first attempt was with bibutils
.
1biblatex2xml references.bib | xml2bib | tee refs.bib
sphinx
does not support electronic
and demands misc
instead.
1biblatex2xml references.bib | xml2bib | sed -e "s/Electronic{/misc{/g" | tee refs.bib
This brought us to:
1@misc{grasedyckParameterdependentSmootherMultigrid2020,
2author="Grasedyck, Lars
3and Klever, Maren
4and L{\"o}bbert, Christian
5and Werthmann, Tim A.",
6title="A Parameter-Dependent Smoother for the Multigrid Method",
7year="2020",
8month="Aug",
9day="03",
10archivePrefix="arXiv",
11eprint="2008.00927",
12url="http://arxiv.org/abs/2008.00927"
13}
14
15@Article{grasedyckDistributedHierarchicalSVD2018,
16author="Grasedyck, Lars
17and L{\"o}bbert, Christian",
18title="Distributed Hierarchical SVD in the Hierarchical Tucker Format",
19journal="Numerical Linear Algebra with Applications",
20year="2018",
21volume="25",
22number="6",
23pages="e2174",
24issn="1099-1506",
25doi="10.1002/nla.2174",
26url="https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2174",
27url="https://doi.org/10.1002/nla.2174"
28}
29
30@Article{grasedyckHierarchicalSingularValue2010,
31author="Grasedyck, Lars",
32title="Hierarchical Singular Value Decomposition of Tensors",
33journal="SIAM Journal on Matrix Analysis and Applications",
34year="2010",
35month="Jan",
36day="01",
37volume="31",
38number="4",
39pages="2029--2054",
40issn="0895-4798",
41doi="10.1137/090764189",
42url="https://epubs.siam.org/doi/abs/10.1137/090764189",
43url="https://doi.org/10.1137/090764189"
44}
Unfortunately this also generated multiple url
lines, which choke sphinx
of course. Additionally, the workflow is rather fickle e.g. not starting the file with a blank line causes the first reference to be silently skipped.
Biber
The key concept here is that biber
can take a configuration file to process the bib
file and is also able to copy the formatted file over with tool
. So our solution will boil down to writing a configuration file and then using it.
First we get the location of the configuration
(also check the version
).
1biber --tool-config
2biber --version
Then we inspect it and extract relevant entries to a configuration file of our own, with some modifications.
- We need to reverse each entry as described here
- However in practice we can get away with a minimal mapping and some command line options..
- We would also like to be mindful of the mapping from
online
tomisc
1<?xml version="1.0" encoding= "utf-8"?>
2<config>
3 <sourcemap>
4 <maps datatype="bibtex">
5 <!-- Easy type conversions -->
6 <map>
7 <map_step map_type_source="report" map_type_target="techreport"/>
8 <map_step map_type_source="online" map_type_target="misc"/>
9 </map>
10 </maps>
11 </sourcemap>
12</config>
At this stage, we are partially done, but the resulting file is rather ugly, and most importantly dates have not yet been transformed. This SE answer provided a hint, as did this gist. Effectively, we need to overwrite part of the map.
1<!-- Date to year, month -->
2<map>
3 <map_step map_field_source="date"
4 map_field_target="year" />
5</map>
6<map>
7 <map_step map_field_source="year"
8 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
9 map_final="1" />
10 <map_step map_field_source="year"
11 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
12 map_replace="$1" />
13 <map_step map_field_set="month" map_origfieldval="1" />
14 <map_step map_field_source="month"
15 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
16 map_replace="$2" />
17</map>
18<map>
19 <map_step map_field_source="year"
20 map_match="(\d{4}|\d{2})-(\d{1,2})" map_final="1" />
21 <map_step map_field_source="year"
22 map_match="(\d{4}|\d{2})-(\d{1,2})" map_replace="$1" />
23 <map_step map_field_set="month" map_origfieldval="1" />
24 <map_step map_field_source="month"
25 map_match="(\d{4}|\d{2})-(\d{1,2})" map_replace="$2" />
26</map>
Now we can also add some pretty printing (before the sourcemap
):
1<output_fieldcase>lower</output_fieldcase>
2<output_resolve>1</output_resolve>
3<output_safechars>1</output_safechars>
4<output_format>bibtex</output_format>
Putting it all together:
1<?xml version="1.0" encoding="utf-8"?>
2<!-- Got the date from https://gist.githubusercontent.com/mkouhia/f00fea7fc8d4effd9dfd/raw/500e9dbc6aa43a47e39c45ba230738ff4544709f/biblatex-to-bibtex.conf -->
3<config>
4 <output_fieldcase>lower</output_fieldcase>
5 <output_resolve>1</output_resolve>
6 <output_safechars>1</output_safechars>
7 <output_format>bibtex</output_format>
8 <sourcemap>
9 <maps datatype="bibtex">
10 <!-- Easy type conversions -->
11 <map>
12 <map_step map_type_source="report" map_type_target="techreport"/>
13 <map_step map_type_source="online" map_type_target="misc"/>
14 </map>
15 <!-- Date to year, month -->
16 <map>
17 <map_step map_field_source="date"
18 map_field_target="year" />
19 </map>
20 <map>
21 <map_step map_field_source="year"
22 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
23 map_final="1" />
24 <map_step map_field_source="year"
25 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
26 map_replace="$1" />
27 <map_step map_field_set="month" map_origfieldval="1" />
28 <map_step map_field_source="month"
29 map_match="(\d{4}|\d{2})-(\d{1,2})-(\d{1,2})"
30 map_replace="$2" />
31 </map>
32 <map>
33 <map_step map_field_source="year"
34 map_match="(\d{4}|\d{2})-(\d{1,2})" map_final="1" />
35 <map_step map_field_source="year"
36 map_match="(\d{4}|\d{2})-(\d{1,2})" map_replace="$1" />
37 <map_step map_field_set="month" map_origfieldval="1" />
38 <map_step map_field_source="month"
39 map_match="(\d{4}|\d{2})-(\d{1,2})" map_replace="$2" />
40 </map>
41 </maps>
42 </sourcemap>
43</config>
However, we have not yet mapped journaltitle
and location
. These are now easier to set on the command line so we shall do so. Also note that in order to have the date
logic work, we will need the --output-legacy-date
option as well.
1# Runs
2biber --tool --configfile=biberConf.xml references.bib --output-file refsTmp.bib --output-legacy-date --output-field-replace=location:address,journaltitle:journal
This gives a sphinx
compatible file:
1@misc{grasedyckParameterdependentSmootherMultigrid2020,
2 author = {Grasedyck, Lars and Klever, Maren and L\"{o}bbert, Christian and Werthmann, Tim A.},
3 url = {http://arxiv.org/abs/2008.00927},
4 eprint = {2008.00927},
5 eprinttype = {arxiv},
6 keywords = {65N55; 15A69,Mathematics - Numerical Analysis},
7 month = {08},
8 title = {A Parameter-Dependent Smoother for the Multigrid Method},
9 urldate = {2021-05-22},
10 year = {2020},
11}
12
13@article{grasedyckDistributedHierarchicalSVD2018,
14 author = {Grasedyck, Lars and L\"{o}bbert, Christian},
15 url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2174},
16 annotation = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/nla.2174},
17 doi = {10.1002/nla.2174},
18 issn = {1099-1506},
19 journal = {Numerical Linear Algebra with Applications},
20 keywords = {Hierarchical Tucker,HT,multigrid,parallel algorithms,SVD,tensor arithmetic},
21 langid = {english},
22 number = {6},
23 pages = {e2174},
24 title = {Distributed Hierarchical {{SVD}} in the {{Hierarchical Tucker}} Format},
25 urldate = {2021-05-22},
26 volume = {25},
27 year = {2018},
28}
29
30@article{grasedyckHierarchicalSingularValue2010,
31 author = {Grasedyck, Lars},
32 publisher = {Society for Industrial and Applied Mathematics},
33 url = {https://epubs.siam.org/doi/abs/10.1137/090764189},
34 doi = {10.1137/090764189},
35 issn = {0895-4798},
36 journal = {SIAM Journal on Matrix Analysis and Applications},
37 month = {01},
38 number = {4},
39 pages = {2029--2054},
40 shortjournal = {SIAM J. Matrix Anal. Appl.},
41 title = {Hierarchical {{Singular Value Decomposition}} of {{Tensors}}},
42 urldate = {2021-05-22},
43 volume = {31},
44 year = {2010},
45}
Conclusions
This was a rather annoying excursion into the unpleasant underbelly of bibtex
and biblatex
. The biber
solution is actually rather elegant, but most definitely not immediately obvious.