You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -16,30 +16,90 @@ Current PowerShell behavior is that a BOM is created by default when a file is c
16
16
This is a problem for Linux systems where the default encoding is UTF8 but a BOM is not written when a file is created.
17
17
Creating files on Linux with a BOM makes it difficult to interact with the native tools, as the following example illustrates.
18
18
19
-
```PowerShell
19
+
```powershell
20
20
PS> "ĝoo" > file.txt
21
21
PS> get-content file.txt
22
22
ĝoo
23
23
PS> exit
24
-
james@jimtru-ops2:~$ cat file.txt
24
+
james@jimtru-ops2:~$ /bin/cat file.txt
25
25
▒▒oo
26
-
27
26
```
27
+
28
28
This is due to the BOM being written into the file:
29
+
29
30
```powershell
30
31
PS /home/james> format-hex file.txt
31
32
32
33
Path: /home/james/file.txt
33
34
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
34
35
00000000 FF FE 1D 01 6F 00 6F 00 0A 00 .þ..o.o...
36
+
^^ ^^
35
37
```
36
-
The native tools on Linux try to render the BOM as actual content, which harms the output.
38
+
The native tools on Linux try to render the BOM as actual content, which results in mistranslated characters.
37
39
If the BOM could be written when the platform expects it, interaction with native tools will be less problematic.
38
40
39
41
## Specification
40
42
41
43
A new global variable `$PSDefaultFileEncoding` shall be available which allows the user to define the encoding for their system.
42
-
The allowed values for this variable shall be defined by
44
+
The allowed values for this variable shall be defined by the `Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding` enum, with the following additions:
45
+
46
+
* UTF8NoBOM
47
+
* Legacy
48
+
49
+
The following is the complete list of `FileSystemCmdletProviderEncoding` members:
50
+
* Ascii
51
+
* BigEndianUnicode
52
+
* BigEndianUTF32
53
+
* Byte
54
+
* Default
55
+
* Legacy
56
+
* Oem
57
+
* String
58
+
* Unicode
59
+
* Unknown
60
+
* UTF32
61
+
* UTF7
62
+
* UTF8
63
+
* UTF8NoBOM
64
+
65
+
When `$PSDefaultFileEncoding` is set to `UTF8NoBOM`, the file will be created with UTF8 encoding but a BOM will not be added.
66
+
67
+
When `$PSDefaultFileEncoding` is set to `Legacy`, the behavior will change based on the platform:
68
+
69
+
**Windows**
70
+
```
71
+
CmdletName Encoding
72
+
---------- --------
73
+
Add-Content ASCII
74
+
Export-Clixml UTF16
75
+
Export-CSV ASCII
76
+
Out-File UTF16
77
+
Set-Content ASCII
78
+
Export-PSSession UTF8 (with BOM)
79
+
Redirection UTF16
80
+
```
81
+
82
+
**Non-Windows**
83
+
```
84
+
CmdletName Encoding
85
+
---------- --------
86
+
Add-Content UTF8 (no BOM)
87
+
Export-Clixml UTF8 (no BOM)
88
+
Export-CSV UTF8 (no BOM)
89
+
Out-File UTF8 (no BOM)
90
+
Set-Content UTF8 (no BOM)
91
+
Export-PSSession UTF8 (no BOM)
92
+
Redirection UTF8 (no BOM)
93
+
```
94
+
The default on Windows systems shall remain unchanged (the value for `$PSDefaultFileEncoding` shall be set to `Legacy`), non-Windows platforms shall set `$PSDefaultFileEncoding` to `UTF8NoBOM`.
95
+
If the `$PSDefaultFileEncoding` is not set, `UTF8NoBOM` shall be the default for non-Windows systems, and the current behavior (`Legacy`) on Windows.
96
+
97
+
### Exclusions
98
+
99
+
Cmdlets which do not create a file are excluded from this change, so the `*-WebRequest` and `*-RestMethod` cmdlets shall not be changed.
100
+
Remoting protocol cmdlets shall also be unaffected with this change.
101
+
102
+
### Optional
43
103
44
104
We should take this opportunity to rationalize our use of the `Encoding` parameter, and change the cmdlets which use Encoding as `string` or `System.Text.Encoding` type to use `Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding`.
45
105
The following cmdlets use various types for the parameter `Encoding`
Creating a file with a BOM on a Linux System, this will specifically put the BOM in the file and will render the file problematic on Linux:
163
+
```powershell
164
+
$PSDefaultFileEncoding = "UTF8"
165
+
PS> "ĝoo" > file.txt
166
+
PS> get-content file.txt
167
+
ĝoo
168
+
PS> exit
169
+
james@jimtru-ops2:~$ cat file.txt
170
+
▒▒oo
171
+
```
82
172
83
-
The default on Windows systems shall remain unchanged, non-Windows platforms shall be defaulted to `UTF8NoBOM` via the `$PSDefaultFileEncoding` variable.
84
-
If the `$PSDefaultFileEncoding` is not set, `UTF8NoBOM` shall be the default for non-Windows systems, and the current behavior () on Windows.
173
+
This mimics our current behavior and is due to the BOM being written into the file.
174
+
This file _would_be suitable for use on a Windows system.
85
175
86
-
### Examples
176
+
Creating a file without a BOM on Windows:
177
+
```powershell
178
+
PS> "ĝoo" |out-file -encoding UTF8NoBOM file.txt
179
+
```
87
180
88
181
### Commentary
89
182
90
-
UTF8NoBOM is, of course, not an encoding but neither are a number of the other values for `FileSystemCmdletProviderEncoding`.
91
-
However, it _is_ descriptive of what we are doing.
183
+
`UTF8NoBOM` and `Legacy` are, of course, not actual encodings but neither are a number of the other values for `FileSystemCmdletProviderEncoding`.
184
+
However, it is somewhat descriptive of our behavior.
185
+
186
+
### Alternate Approaches
187
+
The setting need not be a PowerShell variable, it could be an environment variable or part of the configuration proposed by [PowerShell-StartupConfig](https://github.com/PowerShell/PowerShell-RFC/blob/master/1-Draft/RFC0015-PowerShell-StartupConfig.md).
188
+
However, this is the simplest approach and these alternatives can be done at later time.
0 commit comments