Line breaks bug
This task is part of the FellowshipHacks/Projects/Blog project.
The blog posts migrated from EZ to WP contain some weird line breaks in some parts of the text. We must fix this bug.
People working on this task.
Volunteers are always welcome! Have a look at FellowshipHacks to know how you can help
Last updated on: 090128
We've done some analysis to track the cause of the bug, and thought about possible solutions. We must evaluate the proposed solutions and implement one of them.
- Test the tidy solution during the automated mass-migration test
- Analysis and draft possible solutions
Tested filtering RSS files with tidy: results are OK for some selected blog entries. The command line is: tidy -xml -utf8 -wrap 0 input.rss > output.rss
At first I noticed that the RSS file produced by EZ has extra CR characters; I tried removing them (sed -e 's/\r//g' ez.rss > clean.rss) and reimporting the RSS file but I still get the weird line breaks in the output; so I guess the problem are not the extra CRs but the actual line breaks in the RSS file
- The "weird" line breaks don't appear on the EZ web pages, but only in the RSS file, and in the database object that stores the post text.
- So this could be likely cause of the problem: EZ doesn't honour the linebreaks contained in the post source (relying only on html tags to display the output), while WP does (so "real" line breaks in the RSS file are converted into line breaks on the web page)
Filter the intermediate RSS file through some tool (xmllint, tidy...) to remove line breaks in the <description> element
Fix the EZ RSS generator (/design/fsfe/templates/rss_pagelayout.tpl) not to put line breaks in the <description> XML element
- Fix the WP RSS importer to ignore line breaks
Pre-formatted text is also broken sometimes. Please compare
Cause: WP ignores <br /> and <br> elements inside <pre></pre> blocks. There's more: WP ignores <br /> tags, so I think we have to live with that and fix the post source (replacing <br> with real line breaks).