Skip to content

Commit 6ea6aab

Browse files
committed
Additional language to handle legacy windows behavior
This now covers both new behavior (UTF8NoBOM) and legacy behavior (Legacy)
1 parent 0c7ecf1 commit 6ea6aab

File tree

1 file changed

+123
-26
lines changed

1 file changed

+123
-26
lines changed

1-Draft/DefaultFileEncoding.md

Lines changed: 123 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,30 +16,90 @@ Current PowerShell behavior is that a BOM is created by default when a file is c
1616
This is a problem for Linux systems where the default encoding is UTF8 but a BOM is not written when a file is created.
1717
Creating files on Linux with a BOM makes it difficult to interact with the native tools, as the following example illustrates.
1818

19-
```PowerShell
19+
```powershell
2020
PS> "ĝoo" > file.txt
2121
PS> get-content file.txt
2222
ĝoo
2323
PS> exit
24-
james@jimtru-ops2:~$ cat file.txt
24+
james@jimtru-ops2:~$ /bin/cat file.txt
2525
▒▒oo
26-
2726
```
27+
2828
This is due to the BOM being written into the file:
29+
2930
```powershell
3031
PS /home/james> format-hex file.txt
3132
3233
Path: /home/james/file.txt
3334
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
3435
00000000 FF FE 1D 01 6F 00 6F 00 0A 00 .þ..o.o...
36+
^^ ^^
3537
```
36-
The native tools on Linux try to render the BOM as actual content, which harms the output.
38+
The native tools on Linux try to render the BOM as actual content, which results in mistranslated characters.
3739
If the BOM could be written when the platform expects it, interaction with native tools will be less problematic.
3840

3941
## Specification
4042

4143
A new global variable `$PSDefaultFileEncoding` shall be available which allows the user to define the encoding for their system.
42-
The allowed values for this variable shall be defined by
44+
The allowed values for this variable shall be defined by the `Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding` enum, with the following additions:
45+
46+
* UTF8NoBOM
47+
* Legacy
48+
49+
The following is the complete list of `FileSystemCmdletProviderEncoding` members:
50+
* Ascii
51+
* BigEndianUnicode
52+
* BigEndianUTF32
53+
* Byte
54+
* Default
55+
* Legacy
56+
* Oem
57+
* String
58+
* Unicode
59+
* Unknown
60+
* UTF32
61+
* UTF7
62+
* UTF8
63+
* UTF8NoBOM
64+
65+
When `$PSDefaultFileEncoding` is set to `UTF8NoBOM`, the file will be created with UTF8 encoding but a BOM will not be added.
66+
67+
When `$PSDefaultFileEncoding` is set to `Legacy`, the behavior will change based on the platform:
68+
69+
**Windows**
70+
```
71+
CmdletName Encoding
72+
---------- --------
73+
Add-Content ASCII
74+
Export-Clixml UTF16
75+
Export-CSV ASCII
76+
Out-File UTF16
77+
Set-Content ASCII
78+
Export-PSSession UTF8 (with BOM)
79+
Redirection UTF16
80+
```
81+
82+
**Non-Windows**
83+
```
84+
CmdletName Encoding
85+
---------- --------
86+
Add-Content UTF8 (no BOM)
87+
Export-Clixml UTF8 (no BOM)
88+
Export-CSV UTF8 (no BOM)
89+
Out-File UTF8 (no BOM)
90+
Set-Content UTF8 (no BOM)
91+
Export-PSSession UTF8 (no BOM)
92+
Redirection UTF8 (no BOM)
93+
```
94+
The default on Windows systems shall remain unchanged (the value for `$PSDefaultFileEncoding` shall be set to `Legacy`), non-Windows platforms shall set `$PSDefaultFileEncoding` to `UTF8NoBOM`.
95+
If the `$PSDefaultFileEncoding` is not set, `UTF8NoBOM` shall be the default for non-Windows systems, and the current behavior (`Legacy`) on Windows.
96+
97+
### Exclusions
98+
99+
Cmdlets which do not create a file are excluded from this change, so the `*-WebRequest` and `*-RestMethod` cmdlets shall not be changed.
100+
Remoting protocol cmdlets shall also be unaffected with this change.
101+
102+
### Optional
43103

44104
We should take this opportunity to rationalize our use of the `Encoding` parameter, and change the cmdlets which use Encoding as `string` or `System.Text.Encoding` type to use `Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding`.
45105
The following cmdlets use various types for the parameter `Encoding`
@@ -61,31 +121,68 @@ Send-MailMessage System.Text.Encoding
61121
Set-Content Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding
62122
```
63123

64-
`Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding` shall be extended to include:
124+
It will make these cmdlets easier to maintain over time.
65125

66-
* UTF8NoBOM
126+
### Examples
127+
---
128+
Creating a file without a BOM on a Linux system (the default):
129+
```powershell
130+
PS> "ĝoo" > file.txt
131+
PS> get-content file.txt
132+
ĝoo
133+
PS> exit
134+
james@jimtru-ops2:~$ cat file.txt
135+
ĝoo
136+
```
67137

68-
which result in the membernames of this enum being:
69-
* Ascii
70-
* BigEndianUnicode
71-
* BigEndianUTF32
72-
* Byte
73-
* Default
74-
* Oem
75-
* String
76-
* Unicode
77-
* Unknown
78-
* UTF32
79-
* UTF7
80-
* UTF8
81-
* UTF8NoBOM
138+
Additional details:
139+
```powershell
140+
PS /home/james> "©opyright" > c.txt
141+
PS /home/james> format-hex c.txt
142+
Path: /home/james/c.txt
143+
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
144+
00000000 C2 A9 6F 70 79 72 69 67 68 74 0A ©opyright.
145+
146+
PS /home/james> /bin/cat c.txt
147+
©opyright
148+
PS /home/james> get-content c.txt
149+
©opyright
150+
PS /home/james> bash
151+
james@jimtru-ops2:~$ date >> c.txt
152+
james@jimtru-ops2:~$ cat c.txt
153+
©opyright
154+
Thu Feb 16 15:02:58 PST 2017
155+
james@jimtru-ops2:~$ exit
156+
exit
157+
PS /home/james> get-content -Encoding utf8 c.txt
158+
©opyright
159+
Thu Feb 16 15:02:58 PST 2017
160+
```
161+
162+
Creating a file with a BOM on a Linux System, this will specifically put the BOM in the file and will render the file problematic on Linux:
163+
```powershell
164+
$PSDefaultFileEncoding = "UTF8"
165+
PS> "ĝoo" > file.txt
166+
PS> get-content file.txt
167+
ĝoo
168+
PS> exit
169+
james@jimtru-ops2:~$ cat file.txt
170+
▒▒oo
171+
```
82172

83-
The default on Windows systems shall remain unchanged, non-Windows platforms shall be defaulted to `UTF8NoBOM` via the `$PSDefaultFileEncoding` variable.
84-
If the `$PSDefaultFileEncoding` is not set, `UTF8NoBOM` shall be the default for non-Windows systems, and the current behavior () on Windows.
173+
This mimics our current behavior and is due to the BOM being written into the file.
174+
This file _would_ be suitable for use on a Windows system.
85175

86-
### Examples
176+
Creating a file without a BOM on Windows:
177+
```powershell
178+
PS> "ĝoo" |out-file -encoding UTF8NoBOM file.txt
179+
```
87180

88181
### Commentary
89182

90-
UTF8NoBOM is, of course, not an encoding but neither are a number of the other values for `FileSystemCmdletProviderEncoding`.
91-
However, it _is_ descriptive of what we are doing.
183+
`UTF8NoBOM` and `Legacy` are, of course, not actual encodings but neither are a number of the other values for `FileSystemCmdletProviderEncoding`.
184+
However, it is somewhat descriptive of our behavior.
185+
186+
### Alternate Approaches
187+
The setting need not be a PowerShell variable, it could be an environment variable or part of the configuration proposed by [PowerShell-StartupConfig](https://github.com/PowerShell/PowerShell-RFC/blob/master/1-Draft/RFC0015-PowerShell-StartupConfig.md).
188+
However, this is the simplest approach and these alternatives can be done at later time.

0 commit comments

Comments
 (0)